<a href="https://colab.research.google.com/github/floriandendorfer/demand-estimation/blob/main/python%20tutorial%20sep%2019/tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### OUTLINE of Python tutorial for ECO 380:
- We will start with a brief introduction of arithmetic operations and data structures
- We will import the data set for Graded Problem Set 1 using the pandas library
- We will do some simple data manipulation (e.g. calculate some means, standard deviations, basic OLS) so that you have the tools you need for your Graded Problem Sets.
If you have any questions about the Graded Problem Set or the contents of this tutorial, I am here to help you during my office hours on Tuesdays, 10-11 am in GE 076.

### SECTION 1: INTRO TO PYTHON

#### BASIC ARITHMETIC OPERATIONS

In [1]:
# Declare variables (Python is an object-based programming language)
a = 12
b = 2

# print output
print("The sum of a and b is", a + b)
print("The product of a and b is", a*b)
print("a to the power of b is", a**b) # exponentiation represented by double * symbol: **
print("a divided by b is", a/b)

The sum of a and b is 14
The product of a and b is 24
a to the power of b is 144
a divided by b is 6.0


#### DATA STRUCTURES:
- **Lists** are ordered, mutable (modifiable) collections. You might create a list when you need a collection of items where the order matters but you might want to update some values in the list.
- **Tuples** are ordered but immutable (unmodifiable) collections. You should use tuples when you want to ensure the data cannot be modified after creation (e.g. useful for vector/matrix operations)
- **Dictionaries**, which we will not have time to cover in this tutorial, but these are useful when you want to map names to numbers, or when you want to store information that can be looked up using an identifier key.

#### 1. LISTS

In [2]:
list = [1, 2, 3, 4, 5]

# Print original list
print("Original list:", list)

# Access elements in the list using index. Note: python indices begin at 0!
print("Extract first element of list:", list[0])

# Add element to list to show how lists are mutable
list.append(6)
print("Updated list after appending a 6th element:", list)

Original list: [1, 2, 3, 4, 5]
Extract first element of list: 1
Updated list after appending a 6th element: [1, 2, 3, 4, 5, 6]


In [None]:
# Remove element from list by value
list.remove(3)
print("Remove element with value 3 from list:", list)

# pop vs. del:
del list[2] # remember: third position is indexed using a 2 because indexing starts at 0 in python
print("Remove element in third position using del:", list)

popped_value = list.pop(2)
print("Remove element in third position using pop:", list)
print("Popped element, which is currently stored in the variable popped_value:", popped_value)

#### 2. TUPLES (+ introducing for loops)

In [None]:
tuple1 = (1, 2, 3)
tuple2 = (4, 5, 6)
tuple3 = (7, 8, 9)

# Idea: I want to pull the first element of each tuple, but since I have 3 of them, it will be easier for me to print
# by iterating through a list of my tuples

tuples = [tuple1, tuple2, tuple3] # creates a list of tuples

for i, t in enumerate(tuples): # i is the index number of the tuple we are currently operating on, while t is a placeholder for the variable name `tuplex`
    print(f"First element of tuple{i+1}: {t[0]}") # Issue print statement using an `f-string`, which just allows us to make print statements in a more dynamic way


##### Detailed breakdown of the for loop:

1. **`enumerate(tuples)`**:
    * `enumerate()` is a built-in Python function that allows you to loop over an iterable (in this case, the list `tuples`), while keeping track of the index of each item.
    * `tuples` is a list containing `tuple1` `tuple2` and `tuple3`. The `enumerate()` function will return pairs of an index and the tuple at that index.
    * For example, in the first iteration, `i = 0` and `t = tuple1`, and in the second iteration, `i = 1` and `t = tuple2`.

2. **`i, t`**:
    * `i` is the index of the current item in the list `tuples`. It starts at `0` for the first item, `1` for the second, and so on.
    * `t` is the current tuple itself from the list `tuples` (e.g., `tuple1` in the first iteration, `tuple2` in the second).

3. **`f"First element of tuple{i+1}: {t[0]}"`**:
    * This is an **f-string**, which allows you to embed expressions inside curly braces `{}` and have them evaluated within the string.
    * `i+1`: Since `i` starts from `0`, we add `1` to display the tuple number starting from `1` (so it prints "tuple1", "tuple2", etc.).
    * `t[0]`: This accesses the first element of the current tuple `t`. For example, `t[0]` would be `10` for `tuple1` and `40` for `tuple2`.

* **First iteration:**
    * `i = 0`, `t = tuple1 = (1, 2, 3)`
    * `print(f"First element of tuple1: {t[0]}")` → `"First element of tuple1: 10"`

* **Second iteration:**
    * `i = 1`, `t = tuple2 = (4, 5, 6)`
    * `print(f"First element of tuple2: {t[0]}")` → `"First element of tuple2: 4"`

... and so on.

#### VECTOR AND MATRIX OPERATIONS

Easiest way to perform more complicated mathematical operations is using a library such as `numpy` with built-in methods designed for this purpose.

In [5]:
# Load numpy library
import numpy as np # usually libraries are loaded as the first line of a script

Below are some examples of vector and matrix operations using python:

In [None]:
# Element-by-element addition
# convert tuples to numpy array:
vec1 = np.array(tuple1)
vec2 = np.array(tuple2)
vec3 = np.array(tuple3)
vec_sum = np.add(vec1, vec2)

print("Element-by-element addition of [1, 2, 3] and [4, 5, 6] is:", vec_sum)

In [None]:
# Element-by-element multiplication
vec_hadamard = vec1 * vec2
print("Element-by-element multiplication of [1, 2, 3] and [4, 5, 6] is:", vec_hadamard)

# Dot product
vec_dot = np.dot(vec1, vec2)
print("Dot product is:", vec_dot)

Creating matrices from vectors:

In [None]:
# Create matrix by binding row vectors together
matrix = np.array([vec1, vec2, vec3])
print("3x3 matrix of row-bound vectors")
print(matrix)

# Create a matrix by stacking the row vectors and then transpose (column bind)
matrix2 = np.array([vec1, vec2, vec3]).T
print("3x3 matrix of column-bound vectors using transpose operation")
print(matrix2)

# Another way to column bind:
matrix3 = np.column_stack((vec1, vec2, vec3))
print("3x3 matrix of row-bound vectors using column stack method in numpy")
print(matrix3)

Example of matrix multiplication using `@` operator

In [None]:
# Multiply 3x3 matrix with 3x1 vector should return 3x1 vector
matx = matrix @ vec1
print("Row-bound matrix multiplied with vec1:", matx)

# Multiply two 3x3 matrices to get a 3x3 matrix
matx2 = matrix @ matrix2
print("Row-bound matrix multiplied with column-bound matrix:")
print(matx2)

### SECTION 2: DATA ANALYSIS

To import a CSV file in a Jupyter Notebook using Python, you can use the `pandas` library, which provides a convenient way to read CSV files into a DataFrame.

In [3]:
# Load pandas library
import pandas as pd

# alternatively, uncomment the next line to use the `numpy` library instead
# import numpy as np

In [4]:
# Import GPS1_data.csv for GPS question 4
!git clone https://github.com/floriandendorfer/demand-estimation.git demand-estimation-clone

Cloning into 'demand-estimation-clone'...
remote: Enumerating objects: 96, done.[K
remote: Counting objects: 100% (96/96), done.[K
remote: Compressing objects: 100% (88/88), done.[K
remote: Total 96 (delta 31), reused 29 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (96/96), 125.29 KiB | 833.00 KiB/s, done.
Resolving deltas: 100% (31/31), done.


In [5]:
mktdata = pd.read_csv('demand-estimation-clone/python-tutorial-sep-19/GPS1_data.csv',index_col=0) # load csv file from github repo
mktdata.head(10) # preview first 10 lines of the dataset

Unnamed: 0,month,product,price,sales
0,1,1,5.51,32
1,1,2,6.48,36
2,2,1,8.83,26
3,2,2,7.66,38
4,3,1,5.9,33
5,3,2,4.7,44
6,4,1,8.33,18
7,4,2,8.53,34
8,5,1,7.24,43
9,5,2,9.28,22


The `help()` function can be used to get detailed information about functions, methods, classes and modules. You can pass any Python object to the help() function to get documentation.

In [6]:
# Call `help` function
help(mktdata.head)

Help on method head in module pandas.core.generic:

head(n: 'int' = 5) -> 'Self' method of pandas.core.frame.DataFrame instance
    Return the first `n` rows.
    
    This function returns the first `n` rows for the object based
    on position. It is useful for quickly testing if your object
    has the right type of data in it.
    
    For negative values of `n`, this function returns all rows except
    the last `|n|` rows, equivalent to ``df[:n]``.
    
    If n is larger than the number of rows, this function returns all rows.
    
    Parameters
    ----------
    n : int, default 5
        Number of rows to select.
    
    Returns
    -------
    same type as caller
        The first `n` rows of the caller object.
    
    See Also
    --------
    DataFrame.tail: Returns the last `n` rows.
    
    Examples
    --------
    >>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
    ...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
    >

Let's calculate some basic summary statistics for each of the columns in our dataframe. The easiest way is to use the `.describe()` method from the `pandas` library to quickly summarize the distribution of each of the variables in our dataframe:

In [None]:
# Summarize data using describe() method
summary_mktdata = mktdata.describe()
print("\nSummary statistics:")
print(summary_mktdata)

But if you want you can also compute summary statistics manually:

In [7]:
# Compute mean price
mean_price = mktdata['price'].mean()
print("Mean price is:", round(mean_price, 3))
# print("Mean price is:", mean_price.round(3) )

# Compute standard deviation of price
std_price = mktdata['price'].std()
print("Standard deviation of price is:", round(std_price,3))

# Compute median (50%)
med_price = mktdata['price'].median()
print("Median price is:", med_price.round(3))

Mean price is: 7.38
Standard deviation of price is: 1.511
Median price is: 7.25


In [None]:
# Is this the same as what we found before?

issame_mean = summary_mktdata.loc['mean','price'].round(3) == mean_price.round(3)
print("The means that we found using both methods are the same:", issame_mean)

issame_std = summary_mktdata.loc['std','price'].round(3) == std_price.round(3)
print("The standard deviations that we calculated are the same:", issame_std)

issame_med = summary_mktdata.loc['50%','price'].round(3) == med_price.round(3)
print("The medians that we calculated are the same:", issame_med)


Let's learn to run a regression in python. To do this, you will need a statistical modelling package like `statsmodels.api`.
If you don't already have statsmodels.api installed, you can create a new cell in your Jupyter Notebook and use the `%` operator to run the shell command `%pip install statsmodels` directly from your notebook (alternatively, you can open a terminal window and run `pip install statsmodels` without the `%`).

In [None]:
%pip install statsmodels

In [18]:
# Load statsmodels library
import statsmodels.api as sm

In [None]:
X = mktdata['price'] # Independent variable
y = mktdata['sales'] # Dependent variable

X = sm.add_constant(X) # Add constant to the regression model (intercept)

model = sm.OLS(y, X).fit() # Fit the regression model

# Print model summary
print(model.summary())



You may want to export this table as LaTeX code that you can copy and paste into your LaTeX editor when delivering your solutions for the problem set:

In [None]:
# Generate LaTeX code from the summary of the regression
output = model.summary2().as_latex()

print(output)