# Week 1: Overview of Python

**Sources:**

- Python for Marketing Reserach and Analytics. J. Schwarz, C. Chapman, and E.M. Feit. Springer 2020.

## 1. A quick tour of Python data analysis capabilities

This dataset represents observations from a simple sales and product satisfaction survey. It has 500 (simulated) consumers’ answers to a survey with 4 items asking about satisfaction with a product (iProdSAT), sales experience (iSalesSAT), and likelihood to recommend the product and salesperson (iProdREC and iSalesREC respectively). Each respondent is also assigned to a numerically coded segment (Segment). In the third line of code above, we set Segment to be a categorical type variable.

In [None]:
sat_df = pd.read_csv('chapter2_data.csv')
sat_df.head()

# The dataset represents results from a sales and product satisfaction survey (500 consumers)
# iProdSAT: product satisfaction
# iSalesSAT: experience with the sales person
# Segment: each respondent is assigned a numerically coded segment 
# iProdREC: likelihood to recommend the product
# iSalesREC: likelihood to ecommend the same sales person


In [None]:
sat_df.describe()

The resulting chart is shown in the figure below as a heatmap. The satisfaction items are highly correlated with one another, as are the likelihood-to-recommend items.

In [None]:
import seaborn as sns
sns.heatmap(sat_df.corr())
# The satisfaction items are highly correlated with one another, as are the likelihood-to-recommend items.

Does product satisfaction differ by segment? We compute the mean satisfaction for each segment using the groupby() method:

In [None]:
# Does product satisfaction differ by segment? 
sat_df.groupby('Segment').iProdSAT.mean()

In [None]:
sat_df.Segment = sat_df.Segment.astype(pd.api.types.CategoricalDtype())
sat_df.head()

Segment 4 has the highest level of satisfaction, but are the differences statistically significant? We perform a oneway analysis of variance (ANOVA) and see in the PR column that satisfaction differs significantly by segment:

In [None]:
# Segment 4 has the highest level of satisfaction, but are the differences statistically significant? 
# We perform a oneway analysis of variance (ANOVA)
import statsmodels.formula.api as smf
from statsmodels.stats import anova as sms_anova 

segment_psat_lm = smf.ols('iProdSAT ~ -1 + Segment',
                          data=sat_df).fit() 

sms_anova.anova_lm(segment_psat_lm)

We plot the coefficients and confidence intervals from the ANOVA model to visualize confidence intervals for mean product satisfaction by segment:

In [None]:
# We plot the coefficients and confidence intervals from the ANOVA model to visualize confidence intervals
# for mean product satisfaction by segment

import matplotlib.pyplot as plt
plt.errorbar(y=segment_psat_lm.params.index,
             x=segment_psat_lm.params.values,
             xerr=segment_psat_lm.conf_int()[1].T
                  - segment_psat_lm.params,
             fmt='ko')

## 2. Basic Python Types

- Almost all entities in python are objects (strings, classes, functions, etc.)
- Python is a weakly typed or dynamically typed language which means: (1) flexibility to change an object's type (e.g. from numeric to string), and (2) many basic operators are overloaded (e.g the + operator)



### 2.1 Numeric Types

- Python has three built-in numeric types: int, float, and complex
- float: floating-point numbers, i.e. real numbers. floats can represent decimal values unlike integers int
- ints are more memory efficient
- complex: represent complex numbers that include an imaginary component

In [None]:
# int


In [None]:
# float


### 2.2 Sequence Types

- Python has three sequence types
- Each of the sequence type is an ordered array of objects
- The three types: lists, tuples, and ranges

#### 2.2.1 Lists

- Lists are ordered, mutable sequences of objects
- Defined with square brackets []

- When we add two lists, we concatenate them together

- Use the append() method to add an element to the end of the list
- We pass the element we want to add as an argument to the method append(object)

- Lists can contain a mix of types (e.g. int and strings)

- Use the sort() method to sort the list numerically or alphabitically

- We can find the length of the list using len() Python's built-in function

- A list has an index that starts at 0
- The index represents the position of the element in the list
- The following line of code returns the second element. The list's index starts at 0, so element with index 0 is the first element in the list, and the element with index 1 is the second element in the list


- We can index a range of values using the operator ":"
- In the code below, we retrieve elements starting from index 2 up to and not including 4 (i.e. 2 and 3)
- In Python, the lower bound is inclusive and the upper bound is exlusive

- If we want to start indexing from the begining of the list, a starting number needs not to be specified

- We can index the list all the way till the end by not specifying the end index

- Negative indicies are relative to the end of the list
- The code below retrieves the last two elements of the list x

- When the programming language does not support negative index similar to Python, then we compute as follows:

- Lists are mutable
- Mutable means that we can append elements and substitute elements (i.e. change the content)


#### 2.1.2 Tuples

- Tuples are similar to lists with one major caveat: they're immutable
- Tuples are defined with paranthetical brackets ()

- We index tuples just like lists

- Attempting to modify a tuple leads to an error

#### 2.1.3 Ranges

- Ranges are immutable sequence of numbers
- Mostly used with for loops (we will discuss in a later section, see Control Flow section below)
- It has three positional arguments (start, stop, ans step)
- In the code below, we create a range object. The range of values starts at 5, stops at 30 (not inclusive), with steps of 2

- Only the stop argument is required
- If only the stop argument is provided, the range will start at 0 and increments by 1 up to that value
- In the following code, we specify stop =10, so we get a range of numbers starting at 0 up to 10 (not inclusive, we stop at 9)

- In the following code, we starts at 2 and ends at 12 (not inclusive, we stop at 11)

### 2.3 Text Sequence Type

- Python has a type for text: str (string)
- Strings can be specified using 'single', "dobule", or '''triple''' quotes
- The following code concatenates two string objects using the + operator

- We can index the string similar to lists using square brackets []

- String objects have many string-specific methods
- In the following code, we use the method lower() to modify the case of the letters to lower, and the upper() method to UPER

- Strings are mutable
- We can use the replace() method to replace elements in the string object
- In the following line of code, we replace the 'lo' portion in the string 'hello' with the letter 'p'

- A list of strings can be joined on another string

- A string could be split on a delimiter
- In the following code, we split the string 'Hello, world, what, a, day!' on the comma delimiter ','

- format() method is used to insert values from variables into a string
- The substitution locations are specified using {}
- The values to be substituted in are passed as arguments to format()

- We can also specify names for each substitution

### 2.4 Booleans

- A boolean (or bool) can have only one of two values: `True` or `False`
- Bools are often produced from copmarisons




In [None]:
# Is 1 equal to 1?

In [None]:
# Is 1 less than 2?

In [None]:
# Is 1 equal to 2?

- We can save the boolean in a bool object

- Bools can also be compared using the `and`, `or`, and `not` operators
- Bools are used a lot in control statements (see section below)
- Bools are also used when indexing dataframes (in future class sessions)

### 2.5 Mapping Types (Dictionaries)

- Dictionaries or dicts are data structure that use one object to index another object
- Lists or tuples can store any object but only with an integer index
- A dictionary has two types of objects (keys and values)
- Dictionaries are very efficient
- In the following line of code, we creat a dictionary using the `dict()` function
  - Keys: a, b, and c
  - Values: 1, 2, 3

- We can also define a dictionary using curly brackets

- Just like lists and tuples, indicies are passed using square brackets

- The key-value pairs can be accessed directly as tuples using `items()` method

- Keys in a dictionary can be accessed using `keys()` method

- Values in a dictionary can be accessed using `values()` method

## 3. Control Flow

- Control flow is the order in which the statements in the program are evaluated
- Two type of control flow: conditionals and loops


### 3.1 If statement

- Conditional statements use boolean conditions to create branch points in the code
- The condition is assessed using boolean logic and if the result is True, then the following line of code is executed, otherwise if it is False, the following line will be skipped
- In the following peice of code, the evaluation of the condition x = 5 leads to True since the value of x is 5, and 5 is greater than 2

- In the following peice of code, the condition evaluation leads to False becase 0 is not greater than 2

- `if` statements often includes a paired `else` statement
- In the following peice of code, the `else` statement will be executed since 0 is not greater than 2 

- There is also an `elif` (else if) which evaluates if the previous `if` or `elif` statement evaluated to False
- In the following code, x is not greater than 2; x is not equal to 2; the `else` statement is executed when all the previous `if` or `elif` statements evaluate to False

### 3.2 For loop statement

- Loops iterates through an *iterator*, which is a collection of objects: e.g. lists, tuples, strings, and sets
- In the following code, we iterate through a collection of integer values in the list `a`.
- An *iterator* is an object that contains a countable number of values (source: W3 Schools)
- An *iterator* is an object that can be iterated upon, meaning that you can traverse through all the values (sourse W3 Schools).
 

[For loop flow chart](https://cdn.techbeamers.com/wp-content/uploads/2018/08/Regular-Python-for-loop-flowchart.png)

In [None]:
# given a list a, find the square of each value in a


In [None]:
# we can also save the result in a list intead of printing it


- We can also iterate through a set of numbers using the `range()` function to produce an iterator

- The `zip()` function "zips" together two collections and iterates through a pair of values
- In the following code, we define two ranges, and then we iterate through pairs of values using the `zip()` function

- If one of the collections in the `zip()` function is shorter, then the iteration proceeds for the length of the shorter collection
- In the following, `range(6)` produces 6 value and `range(6,12,2)` produces 3 values, thus the output of the `zip()` function will be of length 3 (Since 3 is less than 6)

- `enumerate()` function returns not only the value from a collection, but also the index

### 3.3 List Comprehension

- List comprehenision is a concise syntax for generating a list from another list
- In the following example, the code takes a list of numbers as input and produces a new list where each element has been incremented by one


- Instead of instantiating an (1) empty list, (2) creating a `for` statment to iterate through the source list, and (3) writing statements to append to a new list; all of those operations can be done in a single line using list comprehension
- List comprehension is in the following form

`newlist = [expression for item in iterable if condition == True]`

- The return value *expression* is a new list, leaving the old list unchanged (source: w3 schools)

- The *iterable* can be any iterable object, like a list, tuple, set etc.(source: w3 schools)
- The *expression* is the current item in the iteration, but it is also the outcome, which you can manipulate before it ends up like a list item in the new list (source: w3 schools)


- The *condition* is like a filter that only accepts the items that valuate to `True` (source: w3 schools)
- The condition below is `x<12` only includes the elements that are less the 12 in the result


- If we want a differential behavior based on a particular condition (if else statement), we can place the `if else` statement before the `for`

- The following code generates a list of tupples, where each tuple pair is a number and its square value

- In the following code, we iterate over a list of tuples

- We can generate dictionaries as follows using dictionary comprehension

## 3.4 While loop statement


    Loops allow one to run the same code repeatedly while systematically changing specific variables
    while loops will iteratively run the code as long as the loop condition is True
    The code below, we initialize the variable x to be 0
    The variable x is used in to control the flow of the loop
    The following loop produces a sequence of integers starting at 0 up to 5 (not inclusive, so the count is to 4)



The following loop runs until i is no longer than len(a) i.e. the length of the list