### OUTLINE of Python tutorial for ECO 380: 
- We will start with a brief introduction of arithmetic operations and data structures
- We will import the data set for Graded Problem Set 1 using the pandas library
- We will do some simple data manipulation (e.g. calculate some means, standard deviations, basic OLS) so that you have the tools you need for your Graded Problem Sets. 
If you have any questions about the Graded Problem Set or the contents of this tutorial, I am here to help you during my office hours on Tuesdays, 10-11 am in GE 076. 

### SECTION 1: INTRO TO PYTHON

#### BASIC ARITHMETIC OPERATIONS

In [13]:
# Declare variables (Python is an object-based programming language)
a = 12
b = 2

# print output
print("The sum of a and b is", a + b)
print("The product of a and b is", a*b) 
print("a to the power of b is", a**b) # exponentiation represented by double * symbol: **
print("a divided by b is", a/b)

The sum of a and b is 14
The product of a and b is 24
a to the power of b is 144
a divided by b is 6.0


#### DATA STRUCTURES: 
- **Lists** are ordered, mutable (modifiable) collections. You might create a list when you need a collection of items where the order matters but you might want to update some values in the list. 
- **Tuples** are ordered but immutable (unmodifiable) collections. You should use tuples when you want to ensure the data cannot be modified after creation (e.g. useful for vector/matrix operations)
- **Dictionaries**, which we will not have time to cover in this tutorial, but these are useful when you want to map names to numbers, or when you want to store information that can be looked up using an identifier key. 

#### 1. LISTS 

In [24]:
list = [1, 2, 3, 4, 5]

# Print original list
print("Original list:", list)

# Access elements in the list using index. Note: python indices begin at 0!
print("Extract first element of list:", list[0]) 

# Add element to list to show how lists are mutable
list.append(6)
print("Updated list after appending a 6th element:", list)

Original list: [1, 2, 3, 4, 5]
Extract first element of list: 1
Updated list after appending a 6th element: [1, 2, 3, 4, 5, 6]


In [25]:
# Remove element from list by value
list.remove(3)
print("Remove element with value 3 from list:", list)

# pop vs. del: 
del list[2] # remember: third position is indexed using a 2 because indexing starts at 0 in python
print("Remove element in third position using del:", list)

popped_value = list.pop(2)
print("Remove element in third position using pop:", list)
print("Popped element, which is currently stored in the variable popped_value:", popped_value)

Remove element with value 3 from list: [1, 2, 4, 5, 6]
Remove element in third position using del: [1, 2, 5, 6]
Remove element in third position using pop: [1, 2, 6]
Popped element, which is currently stored in the variable popped_value: 5


#### 2. TUPLES (+ introducing for loops)

In [27]:
tuple1 = (1, 2, 3)
tuple2 = (4, 5, 6)
tuple3 = (7, 8, 9)

# Idea: I want to pull the first element of each tuple, but since I have 3 of them, it will be easier for me to print
# by iterating through a list of my tuples

tuples = [tuple1, tuple2, tuple3] # creates a list of tuples

for i, t in enumerate(tuples): # i is the index number of the tuple we are currently operating on, while t is a placeholder for the variable name `tuplex`
    print(f"First element of tuple{i+1}: {t[0]}") # Issue print statement using an `f-string`, which just allows us to make print statements in a more dynamic way
     

First element of tuple1: 1
First element of tuple2: 4
First element of tuple3: 7


##### Detailed breakdown of the for loop:

1. **`enumerate(tuples)`**:
    * `enumerate()` is a built-in Python function that allows you to loop over an iterable (in this case, the list `tuples`), while keeping track of the index of each item.
    * `tuples` is a list containing `tuple1` and `tuple2`. The `enumerate()` function will return pairs of an index and the tuple at that index.
    * For example, in the first iteration, `i = 0` and `t = tuple1`, and in the second iteration, `i = 1` and `t = tuple2`.

2. **`i, t`**:
    * `i` is the index of the current item in the list `tuples`. It starts at `0` for the first item, `1` for the second, and so on.
    * `t` is the current tuple itself from the list `tuples` (e.g., `tuple1` in the first iteration, `tuple2` in the second).

3. **`f"First element of tuple{i+1}: {t[0]}"`**:
    * This is an **f-string**, which allows you to embed expressions inside curly braces `{}` and have them evaluated within the string.
    * `i+1`: Since `i` starts from `0`, we add `1` to display the tuple number starting from `1` (so it prints "tuple1", "tuple2", etc.).
    * `t[0]`: This accesses the first element of the current tuple `t`. For example, `t[0]` would be `10` for `tuple1` and `40` for `tuple2`.

* **First iteration:**
    * `i = 0`, `t = tuple1 = (1, 2, 3)`
    * `print(f"First element of tuple1: {t[0]}")` → `"First element of tuple1: 10"`

* **Second iteration:**
    * `i = 1`, `t = tuple2 = (4, 5, 6)`
    * `print(f"First element of tuple2: {t[0]}")` → `"First element of tuple2: 4"`

... and so on.

#### VECTOR AND MATRIX OPERATIONS

Easiest way to perform more complicated mathematical operations is using a library such as `numpy` with built-in methods designed for this purpose. 

In [29]:
# Load numpy library
import numpy as np # usually libraries are loaded as the first line of a script

Below are some examples of vector and matrix operations using python:

In [36]:
# Element-by-element addition
# convert tuples to numpy array:
vec1 = np.array(tuple1)
vec2 = np.array(tuple2)
vec3 = np.array(tuple3)
vec_sum = np.add(vec1, vec2)

print("Element-by-element addition of [1, 2, 3] and [4, 5, 6] is:", vec_sum)

Element-by-element addition of [1, 2, 3] and [4, 5, 6] is: [5 7 9]


In [38]:
# Element-by-element multiplication 
vec_hadamard = vec1 * vec2
print("Element-by-element multiplication of [1, 2, 3] and [4, 5, 6] is:", vec_hadamard)

# Dot product 
vec_dot = np.dot(vec1, vec2)
print("Dot product is:", vec_dot)

Element-by-element multiplication of [1, 2, 3] and [4, 5, 6] is: [ 4 10 18]
Dot product is: 32


Creating matrices from vectors:

In [40]:
# Create matrix by binding row vectors together
matrix = np.array([vec1, vec2, vec3])
print("3x3 matrix of row-bound vectors")
print(matrix)

# Create a matrix by stacking the row vectors and then transpose (column bind)
matrix2 = np.array([vec1, vec2, vec3]).T
print("3x3 matrix of column-bound vectors using transpose operation")
print(matrix2)

# Another way to column bind: 
matrix3 = np.column_stack((vec1, vec2, vec3))
print("3x3 matrix of row-bound vectors using column stack method in numpy")
print(matrix3)

3x3 matrix of row-bound vectors
[[1 2 3]
 [4 5 6]
 [7 8 9]]
3x3 matrix of column-bound vectors using transpose operation
[[1 4 7]
 [2 5 8]
 [3 6 9]]
3x3 matrix of row-bound vectors using column stack method in numpy
[[1 4 7]
 [2 5 8]
 [3 6 9]]


Example of matrix multiplication using `@` operator

In [43]:
# Multiply 3x3 matrix with 3x1 vector should return 3x1 vector
matx = matrix @ vec1
print("Row-bound matrix multiplied with vec1:", matx)

# Multiply two 3x3 matrices to get a 3x3 matrix
matx2 = matrix @ matrix2
print("Row-bound matrix multiplied with column-bound matrix:")
print(matx2)

Row-bound matrix multiplied with vec1: [14 32 50]
Row-bound matrix multiplied with column-bound matrix:
[[ 14  32  50]
 [ 32  77 122]
 [ 50 122 194]]


### SECTION 2: DATA ANALYSIS

To import a CSV file in a Jupyter Notebook using Python, you can use the `pandas` library, which provides a convenient way to read CSV files into a DataFrame. 

In [None]:
# Load pandas library
import pandas as pd

# alternatively, uncomment the next line to use the `numpy` library instead
# import numpy as np

In [44]:
# Import GPS1_data.csv for GPS question 4
mktdata = pd.read_csv('GPS1_data.csv')
mktdata.head(10) # preview first 10 lines of the dataset


Unnamed: 0.1,Unnamed: 0,month,product,price,sales,market share,s,s0
0,0,1,1.0,6.2347,18.38438,0.510277,0.183844,0.639717
1,1,1,2.0,5.940261,17.643883,0.489723,0.176439,0.639717
2,2,2,1.0,6.061581,18.958083,0.522593,0.189581,0.637231
3,3,2,2.0,6.013753,17.318857,0.477407,0.173189,0.637231
4,4,3,1.0,5.954426,19.352801,0.532703,0.193528,0.636706
5,5,3,2.0,6.109423,16.976633,0.467297,0.169766,0.636706
6,6,4,1.0,6.135976,18.675881,0.514707,0.186759,0.637155
7,7,4,2.0,5.930201,17.608606,0.485293,0.176086,0.637155
8,8,5,1.0,6.141834,18.740894,0.520758,0.187409,0.640123
9,9,5,2.0,6.05723,17.246837,0.479242,0.172468,0.640123


The `help()` function can be used to get detailed information about functions, methods, classes and modules. You can pass any Python object to the help() function to get documentation.

In [None]:
# Call `help` function
help(mktdata.head)

Let's calculate some basic summary statistics for each of the columns in our dataframe. The easiest way is to use the `.describe()` method from the `pandas` library to quickly summarize the distribution of each of the variables in our dataframe: 

In [46]:
# Summarize data using describe() method
summary_mktdata = mktdata.describe()
print("\nSummary statistics:")
print(summary_mktdata)


Summary statistics:
       Unnamed: 0       month     product       price       sales  \
count  104.000000  104.000000  104.000000  104.000000  104.000000   
mean    51.500000   26.500000    1.500000    6.097447   17.999482   
std     30.166206   15.081011    0.502421    0.148531    0.790692   
min      0.000000    1.000000    1.000000    5.606402   16.391871   
25%     25.750000   13.750000    1.000000    6.014579   17.310008   
50%     51.500000   26.500000    1.500000    6.100690   17.949316   
75%     77.250000   39.250000    2.000000    6.173355   18.703793   
max    103.000000   52.000000    2.000000    6.488639   19.607887   

       market share           s          s0  
count    104.000000  104.000000  104.000000  
mean       0.500000    0.179995    0.640010  
std        0.021072    0.007907    0.004456  
min        0.455333    0.163919    0.630717  
25%        0.481558    0.173100    0.637197  
50%        0.500000    0.179493    0.639650  
75%        0.518442    0.187038    

But if you want you can also compute summary statistics manually: 

In [55]:
# Compute mean price
mean_price = mktdata['price'].mean()
print("Mean price is:", round(mean_price, 3))
# print("Mean price is:", mean_price.round(3) )

# Compute standard deviation of price
std_price = mktdata['price'].std()
print("Standard deviation of price is:", round(std_price,3))

# Compute median (50%)
med_price = mktdata['price'].median()
print("Median price is:", med_price.round(3))

Mean price is: 6.097
Standard deviation of price is: 0.149
Median price is: 6.101


In [68]:
# Is this the same as what we found before? 

issame_mean = summary_mktdata.loc['mean','price'].round(3) == mean_price.round(3)
print("The means that we found using both methods are the same:", issame_mean)

issame_std = summary_mktdata.loc['std','price'].round(3) == std_price.round(3)
print("The standard deviations that we calculated are the same:", issame_std)

issame_med = summary_mktdata.loc['50%','price'].round(3) == med_price.round(3)
print("The medians that we calculated are the same:", issame_med)


The means that we found using both methods are the same: True
The standard deviations that we calculated are the same: True
The medians that we calculated are the same: True


Let's learn to run a regression in python. To do this, you will need a statistical modelling package like `statsmodels.api`. 
If you don't already have statsmodels.api installed, you can create a new cell in your Jupyter Notebook and use the `%` operator to run the shell command `%pip install statsmodels` directly from your notebook (alternatively, you can open a terminal window and run `pip install statsmodels` without the `%`). 

In [73]:
%pip install statsmodels

6292.48s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Note: you may need to restart the kernel to use updated packages.


In [72]:
# Load statsmodels library
import statsmodels.api as sm

In [76]:
X = mktdata['price'] # Independent variable
y = mktdata['sales'] # Dependent variable

X = sm.add_constant(X) # Add constant to the regression model (intercept)

model = sm.OLS(y, X).fit() # Fit the regression model

# Print model summary
print(model.summary())



                            OLS Regression Results                            
Dep. Variable:                  sales   R-squared:                       0.009
Model:                            OLS   Adj. R-squared:                 -0.001
Method:                 Least Squares   F-statistic:                    0.8908
Date:                Tue, 10 Sep 2024   Prob (F-statistic):              0.347
Time:                        11:57:38   Log-Likelihood:                -122.19
No. Observations:                 104   AIC:                             248.4
Df Residuals:                     102   BIC:                             253.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         21.0198      3.201      6.567      0.0

You may want to export this table as LaTeX code that you can copy and paste into your LaTeX editor when delivering your solutions for the problem set:

In [77]:
# Generate LaTeX code from the summary of the regression
output = model.summary2().as_latex()

print(output)

\begin{table}
\caption{Results: Ordinary least squares}
\label{}
\begin{center}
\begin{tabular}{llll}
\hline
Model:              & OLS              & Adj. R-squared:     & -0.001    \\
Dependent Variable: & sales            & AIC:                & 248.3818  \\
Date:               & 2024-09-10 11:58 & BIC:                & 253.6706  \\
No. Observations:   & 104              & Log-Likelihood:     & -122.19   \\
Df Model:           & 1                & F-statistic:        & 0.8908    \\
Df Residuals:       & 102              & Prob (F-statistic): & 0.347     \\
R-squared:          & 0.009            & Scale:              & 0.62586   \\
\hline
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{lrrrrrr}
\hline
      &   Coef. & Std.Err. &       t & P$> |$t$|$ &  [0.025 &  0.975]  \\
\hline
const & 21.0198 &   3.2009 &  6.5667 &      0.0000 & 14.6707 & 27.3688  \\
price & -0.4953 &   0.5248 & -0.9438 &      0.3475 & -1.5363 &  0.5456  \\
\hline
\end{tabular}
\end{center}

\begin{cent