# Intro to Python

#### Table of Contents

* [Creating Virtual Environments](#Creating-Virtual-Environments)
* [Welcome to Jupyter!](#Welcome-to-Jupyter!)
* [Scalar Types](#Scalar-Types)
* [Type Casting](#Type-Casting)
* [Listing Objects](#Listing-Objects)
* [Data Structures](#Data-Structures)
* [Loops](#Loops)
* [Functions](#Functions)
* [Classes](#Classes)
* [Exporting Notebook](#Exporting-Notebook)
- [NumPy](#NumPy)
    * [Matrices](#Matrices)
    * [Random](#Random)
    * [Math](#Math)
- [Pandas](#Pandas)
    * [Series, DataFrames, and Indices](#Series,-DataFrames,-and-Indices)
    * [Multiple Data Sets](#Multiple-Data-Sets)
    * [Real Data](#Real-Data)
    * [Pivoting](#Pivoting)
- [Matplotlib](#Matplotlib)
- [Seaborn](#Seaborn)
    - [Other Plots](#Other-Plots)

***

# Creating Virtual Environments

In `Anaconda Prompt`
```
conda create -n AMLinEcon tensorflow
conda activate AMLinEcon
```
To switch between virtual environments:
```
conda activate base
```
To download packages to desired virtual environment:
```
conda activate AMLinEcon
conda install pandas
```
An alternative way to install packages is through Anaconda Navigator itself. 
Click on the `Environments` tab, click on the virtual environment you wish to use (`MLinEcon`), and ensure the dropdown menu says not "Installed" but "All". 
Search for the package name. 
If it has a green checkbox, it is already installed. 
Otherwise click the box and click `Apply` in the bottom right.
Anaconda will perfrom a check for packages to modify. 
Once it is done, click `Apply`.

*Note that we can perform all of our* `Git` *operations in Anaconda Prompt*.

**********************************************************************************

# Welcome to Jupyter!

Jupyter is a graphical user interface or GUI for Python.
It has a Markdown style text editor to combine code and text. 
Together, this makes the *notebook* part of Jupyter Notebooks. 
However, I would encourage you to use Jupyter Lab as the notebooks is older and becoming obsolete.

When dealing with text or code cells, there are two modes: 

    1. command - modify cells
    2. edit - modify cell content

Here are some keyboard shorcuts:

| Mode    | Shortcut | Description |
|--------:|:--------:|:------------|
| Edit    | `Esc` | Enter command mode |
| Command | `A` | Insert cell above highlighted cell |
| Command | `B` | Insert cell below highlighted cell |
| Command | `Y` | Change cell to Code |
| Command | `M` | Change cell to Markdown |
| Command | `R` | Change cell to Raw |
| Command | `C` | Copy cell |
| Command | `X` | Cut cell |
| Command | `V` | Paste cell |
| Command | `D, D` | Delete cell |
| Command | `Enter` | Enter edit mode |
| Either | `Ctrl-Enter` | Execute cell |
| Either | `Shift-Enter` | Execute cell *and* insert Code cell below |

This is how you type in Markdown.
In the cell, this is a new line.
But it will appear in one line.

This is how you type with *italics* or _italics_. 
This is how you type with **boldface** or __boldface__.
This is how you type with **_boldface italics_** or __*boldface italics*__.

If you want to, you can make a list, but make sure things line up!

* An item
- Another item
  1. You can alternatively use numbers
  * Mix and match!
     * You know it is working when it changes colors!

We can also [link sections](#Python-Part-1).

***********

It is possible to write code inline: `print('Hello!)`. 

If we alternatively have multiple lines of code we want to write but not execute, we can do this:
```
print('Hello,')
print("world!")
```

***********************************

Don't forget about math!
We can write math inline: $y_i = \alpha + x_i \beta + \epsilon_i$.

But if we have some important math, we can make it stand out:
$$
\sum_{t = 0}^\infty ar^t = \frac{a}{1 - r}
$$

But if we have some serious math, we can use some $\latex$ (if only that worked...)
$$
\begin{align*}
    2 & = 1 + 1\\
    & = 2*1 \tag{factoring}
\end{align*}
$$

*********
Now, let's get to coding!

In [None]:
# bring Python to life!
'Hello, world!'

In [None]:
# Python is just a calculator
1 + 1

In [None]:
# Jupyter (not Python) only prints out the last line (if applicable)
'Hello, world!'
1+1

In [None]:
# Need print()
print('Hello, world!') # prints into notebook
1+1                    # echo from terminal

In [None]:
# See?
print('hello')

In [None]:
# A fancy calculator
a = 3
b = 2
c = a ** b
print(c)

******
# Scalar Types
[Top](#Intro-to-Python)

In [None]:
# numbers
print(type(3)) # an integer
print(type(3.14)) # a float
print(type(3.)) # a float

In [None]:
# Booleans
print(type(True))
print(not False)  # negation
print(3.14 > 490) # weakly greater than
print(3. <= 3)    # GREATER THAN or EQUAL TO
print(3. == 3)    # EQUAL TO
print((3. <= 3) & (3.14 > 490)) # and
print((3. <= 3) | (3.14 > 490)) # or
print((3. <= 3) ^ (3.14 > 490)) # xor
print(True + 3)
False * 10

In [None]:
# Strings, glorious strings
hello = 'Hello, world!'
print(type(hello))

hello2 = "Hello, world!"
hello == hello2

In [None]:
# There are methods for some objects such as .replace()
s = "This is a string."
s2 = s.replace('string', 'longer string')
print(s2)

In [None]:
l = 'left'
r = 'right'
print(l + r)

In [None]:
# strings can also have special characters:
print('hello\n\thi')

In [None]:
print('hello\\hi')

In [None]:
# r stands for raw
print(r'no\special\characters')

In [None]:
# Fancy strings
template = '{0:.2f} {1:s} are worth US${2:d}'
# {0:.2f} - first argument as float with two decimal places
# {1:s} - seconds argument as a string
# {2:d} - third argument as an integer
template.format(1.368, 'Euros', 1)

In [None]:
# fancier strings
e_float = 1.368
e_string = 'Euros'
e_int = 1
print('%.2f %s are worth US$%d' % (e_float, e_string, e_int)) # a tuple

****
# Type Casting
[Top](#Intro-to-Python)

In [None]:
p = '3.14'
fval = float(p)
print(fval)
print(type(fval))
print(int(fval))
print(bool(fval))
print(bool(-0.1)) # anything non-zero-numeric is True
print(bool(0))
print(bool('0'))

***
# Listing Objects
[Top](#Intro-to-Python)

In [None]:
%who

In [None]:
%whos 
# I prefer this

****
# Data Structures
[Top](#Intro-to-Python)

In [None]:
###########
## Tuple ##
###########
tup = 1, 2, 3
print(tup)

nested_tup = (1, 2, 3), (4, 5)
print(nested_tup)

print(tup*2, '\n')

a = 1,2,2,2,3,4
print(a.count(2))
print(tuple('string'), '\n')

values = 1, 2, 3, 4, 5
a, b, *_ = values # Discard
print(a, b)

a, b, *remainder = values
print(remainder)

In [None]:
###########
## Lists ##
###########
print(type(remainder))

rng = range(0, 10)
print(rng)
print(list(rng))

$$ CAUTION $$
Python is zero indexed! 
Also, the final argument is exclusive:

`range(0, 10)` $\equiv [0, 10)$

In [None]:
num = list(range(3))
print(num)
num.append(3)
print(num, '\n')

num.insert(1, 10)
print(num)
num.pop(2)
print(num)
num.remove(10)
print(num, '\n')

print(1 in num)
print(num + num)
num2 = num + num
num2.sort()
print(num2, '\n')

# Indexing
lst = [(1,2), 3, 4, 5]
print(lst[0])
print(lst[0:3])
print(lst[-3:-1])
print(lst[:-1])
print(lst[::-1]) # reverse!

$$CAUTION!!$$
Assignment references the same object!

In [None]:
a = list('minimum wage')
b = a
b[8] = 'r' # modify
print(a)

In [None]:
a = list('blah')
b = a
b = 2 # overwrite
print(a)

In [None]:
a = list('minimum wage')
b = a.copy()
b[8] = 'r'
print(a)

In [None]:
#####################
## lists vs tuples ##
#####################
tup = tuple('minimum wage')
tup[8] = 'r'
# tuples are immutable
# lists are mutable

In [None]:
##########
## Sets ##
##########
type({'blah'})

In [None]:
##################
## Dictionaries ## key: value
##################
d = {'a': 'string',
    'b': (1, 2, '3'),
    1: {'c' : 2}}
print(d)
print(d['b'])
print(d[1]['c'])

***
# Loops
[Top](#Intro-to-Python)

In [None]:
# never use while loops
i = 0
while i < 5: # change to i > -5, Kernel > Restart ->
    print(i) # comment about no need to rerun anything
    i += 1

In [None]:
for i in range(10):
    if i < 5:
        print(i)
    else:
        break

In [None]:
# Creating a dictionary
mapping = {}
h = list('hello')
for i, value in enumerate(h):
    mapping[value] = i
print(mapping, '\n')

rng = range(len(h))
zipped = zip(rng, h)
print(zipped)

for i, (a, b) in enumerate(zip(rng, h)):
    print('{0}: {1}, {2}'.format(i, a, b))

boom = [1, 2]
values = [-1, 0 ,1]
for v in values:
    if v not in boom:
        if v > 0:
            print('positive')
        elif v < 0:
            print('negative')
        else:
            print('zero')
    else: 
        print('BOOM!')

In [None]:
####################
## Comprehensions ##
####################
strings = ['applied', 'machine', 'learning', 'in', 'economics']

# list comprehensions
print([something.upper() for something in strings if something.count('a') >= 1])

# set comprehensions
print({len(x) for x in strings})
print(set(map(len, strings))) # more efficient

# dict comprehnsions
print({val: index for index, val in enumerate(strings)}, '\n')

*** 
# Functions
[Top](#Intro-to-Python)

Two primary types:

1. Defined functions
   * multiple lines
   * multiple expressions
2. Lambda functions 
   * oneline 
   * arguments: expression
   * "anonymous" function

In [None]:
def floor_div(x, y):
    """
    this function returns the floor divide value and the remainder
    """
    value = x // y
    remainder = x - value*y
    return value, remainder
v, r = floor_div(6,4)
print('Value: ' + str(v) + '\nRemainder:', r)

fl_dv = lambda a, b : a // b
fl_dv(6, 4)

In [None]:
floor_div?

In [None]:
floor_div??

***
# Classes
[Top](#Intro-to-Python)

Much like how defining `a = 2` makes `a` an object, classes are objects with properties and methods.

In [None]:
class course:
    def __init__(self, dept, number, name): # __init__ is always executed
        self.dept = dept
        self.number = number
        self.name = name
        
    def is_econ_topic(self):
        if (self.dept == 'ECON') & (self.number == 490):
            print('yep')
        else:
            print('nope')
            
x = course('ECON', 490, 'Applied Machine Learning in Econoimcs')
print(x.number)

x.is_econ_topic()

In [None]:
class course:
    def __init__(first, dept, number, name):
        first.dept = dept
        first.number = number
        first.name = name
        
    def is_econ_topic(somethingelse):
        if (somethingelse.dept == 'ECON') & (somethingelse.number == 490):
            print('yep')
        else:
            print('nope')
            
x = course('ECON', 490, 'Applied Machine Learning in Econoimcs')
print(x.number)

x.is_econ_topic()

***
# Exporting Notebook
[Top](#Intro-to-Python)
`File > Export Notebook As... > Export Notebook to PDF`

***
# NumPy
[Top](#Intro-to-Python)

**Num**erical **Py**thon
## Matrices
[Top](#Intro-to-Python)

In [None]:
import numpy as np

In [None]:
arr = np.array([[1,2,3],
               [4,5,6]])
print(arr, '\n')

print(arr.T, '\n')

print(arr.shape)
print(arr.ndim)
print(arr.dtype)
print(arr.astype(np.float64).dtype)

In [None]:
# i can row but j cannot
print(arr[0])
print(arr[0, :])
print(arr[0, 1:])
print(arr[:, ::-1])

In [None]:
print(np.arange(6))
print(np.arange(6, step = 0.5), '\n')

print(np.arange(6).reshape(2, 3)) # by row
print(np.arange(1, 7).reshape(2, 3, order = 'F'), '\n') # by column

print(np.linspace(0, 1, num = 10), '\n')
print(np.linspace(0, 1, num = 4).reshape(2, 2))

*****************************
## Random
[TOP](#Intro-to-Python)

`numpy` is a __*package*__. 
We will use its `random` _**module**_.

In [None]:
# So you can replicate
np.random.seed(490) # sets seed for numpy functions

In [None]:
np.random.random(4)

In [None]:
# It changes if you run it again
np.random.random(4)

In [None]:
# Keep in same cell to ensure consistency
np.random.seed(490)
np.random.random(4)

In [None]:
from numpy import random as npr
import numpy.random as npr

In [None]:
pois = npr.poisson(size = 10)

In [None]:
print(pois)
print(np.unique(pois), '\n')

unique, counts = np.unique(pois, return_counts = True)
print(dict(zip(unique, counts)), '\n')

print(list(zip(unique, counts)), '\n')

np.array(np.unique(pois, return_counts = True)).T

In [None]:
npr.uniform(low = 3, high = 4, size = (2, 3, 2))

In [None]:
npr.normal(loc = 4, scale = .90, size = (2, 3)) # mean, std

******
## Math
[TOP](#Intro-to-Python)

In [None]:
arr = np.array([[1, 1, 1],
               [1,2,3]])
print(arr, '\n')

print(np.mean(arr))
print(np.mean(arr, axis = 0)) # down the rows
print(np.mean(arr, axis = 1), '\n') # across the colums

print(np.std(arr, axis = 1), '\n')

print(np.log(arr))
np.nan

*****
# Pandas
[TOP](#Intro-to-Python)

**P**ytho**N** **D**ata **A**nalysi**S** (almost pandas?)

## Series, DataFrames, and Indices
[TOP](#Intro-to-Python)

In [None]:
import pandas as pd

In [None]:
gm_gdpc = pd.read_csv('gapminder GDPc.csv')
gm_gdpc.head()

In [None]:
# Pandas core object is a DataFrame
type(gm_gdpc)

In [None]:
# Each columns is a series
type(gm_gdpc.country)

In [None]:
# DataFrames and series have indices
print(gm_gdpc.index, '\n')

s = gm_gdpc.country
print(s.head(), '\n')
s.index

In [None]:
# indices are immutable like tuples
gm_gdpc.index[0] = 'zero'

In [None]:
# you can set indices from columns when loading data
gm_life = pd.read_csv('gapminder life expectancy.csv', index_col = 'country')
gm_life.head()

Checkout `pd.read_csv?` for more options
***
# Multiple Data Sets
[TOP](#Intro-to-Python)

There are three ways to combine data:

1. `df.join()` - combine multiple data frames on indices
2. `df.merge()` or `pd.merge()` - combine two data frames on either indices or columns
3. `pd.concat()` - row/column stacking

In [None]:
gm_gdpc.head(1)

In [None]:
gm_life.set_index('continent', append = True, inplace = True)

In [None]:
# Actually, only right has to be on index
# gm_gdpc.join(gm_life, on = ['country', 'continent'], how = 'inner')
gm = gm_gdpc.join(gm_life, on = ['country', 'continent'], how = 'inner')

In [None]:
# To do multiple, the left must be on index
gm_gdpc.set_index(['country', 'continent'], inplace = True)

In [None]:
# They need unique column names
gm_life2 = gm_life.copy()
gm_life2.columns = gm_life2.columns.str.lower()

In [None]:
# Ta dah!
gm_gdpc.join([gm_life, gm_life2]).columns

In [None]:
print(gm.shape)
print(pd.concat([gm, gm, gm]).shape)
print(pd.concat([gm, gm, gm], axis = 'columns').shape)
print(pd.concat([gm, gm, gm], axis = 1).shape)

### Time to Make the Gap Minder Data

In [None]:
gm_pop = pd.read_csv('gapminder population.csv')
gm_pop.head()

In [None]:
# I don't like holding down shift
gm_pop.columns = gm_pop.columns.str.lower()

In [None]:
# the population data is formatted correctly, but the rest are not
# pd.wide_to_long(gm.reset_index(), stubnames = ['gdpPercap', 'lifeExp'], i = ['country', 'continent'], j = 'year', sep = '_')
# pd.wide_to_long(gm.reset_index(), stubnames = ['gdpPercap', 'lifeExp'], i = 'country', j = 'year', sep = '_')
gm = pd.wide_to_long(gm.reset_index(), stubnames = ['gdpPercap', 'lifeExp'], i = 'country', j = 'year', sep = '_')
gm.head(3)

A very good resource for understanding pivotting tables: `pivot`, `pivot_table`, `stack`, and `unstack`

https://nikgrozev.com/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/

In [None]:
gm.sort_index(inplace = True)

In [None]:
gm_pop = gm_pop.drop(columns = 'continent').set_index(['country', 'year'])
gm_pop.head()

In [None]:
gm = gm.join(gm_pop)

In [None]:
# Creating a new column
gm['gdp'] = gm['pop'] * gm.gdpPercap
# Note df.pop() is a method
gm.head(1)

In [None]:
# Now, let's inspect our data frame
gm.describe()
gm.info() # shouldn't population be an integer?

In [None]:
# Locate an instance that is not an integer
sum(gm['pop']%1 != 0)
np.where(gm['pop']%1 != 0)
gm['pop'].iloc[288]

In [None]:
# note
int(0.9)

In [None]:
# You can force type ocnversion
np.round(gm['pop']).astype(int)
# to overwrite
gm.loc[:, 'pop'] = np.round(gm['pop']).astype(int)

In [None]:
gm.info()

****************
# Matplotlib
[TOP](#Intro-to-Python)

In [None]:
import matplotlib.pyplot as plt

In [None]:
gm.head()

In [None]:
# gm.loc[:, 2007, :]
gm_2007 = gm.loc[:, 2007, :]

In [None]:
gm_2007.gdpPercap.plot(kind = 'hist')

In [None]:
plt.scatter(gm_2007.gdpPercap, gm_2007.lifeExp)
# plt.show()

In [None]:
plt.figure(figsize = (8, 4.5))
# We could add colors, but we would need to create a dictionary
plt.scatter(gm_2007.gdpPercap, gm_2007.lifeExp)

plt.title('log GDPc vs Life Expectancy in 2007')
plt.ylabel('Life Expectancy (years)')
plt.xlabel('$\\frac{GDP}{Population}$')

plt.semilogx()
plt.show()

***
# Seaborn
[TOP](#Intro-to-Python)

In [None]:
import seaborn as sns

# rc = runtime configuration
sns.set(rc = {'axes.titlesize': 24,
             'axes.labelsize': 20,
             'xtick.labelsize': 12,
             'ytick.labelsize': 12,
             'figure.figsize': (8, 4.5)})

In [None]:
sns.scatterplot(data = gm_2007, x = 'gdpPercap', y = 'lifeExp',
                hue = 'continent', size = 'pop')

plt.title('log GDPc vs Life Expectancy in 2007')
plt.ylabel('Life Expectancy (years)')
plt.xlabel('$\\frac{GDP}{Population}$')

plt.semilogx()
plt.show()

In [None]:
fig = sns.scatterplot(data = gm_2007, x = 'gdpPercap', y = 'lifeExp',
                      hue = 'continent', size = 'pop', sizes = (20, 2000),
                     alpha = 0.75)
h, l = fig.get_legend_handles_labels()
# print(h,'\n')
# print(l)
plt.legend(h[1:6], l[1:6], loc = 'lower right', title = 'Continent')

plt.title('log GDPc vs Life Expectancy in 2007')
plt.ylabel('Life Expectancy (years)')
plt.xlabel('$\\frac{GDP}{Population}$')

plt.semilogx()
plt.show()

## Other Plots
[TOP](#Intro-to-Python)

In [None]:
sns.pairplot(data = gm_2007)

In [None]:
sns.pairplot(data = gm_2007, kind = 'hist',
             plot_kws = {'bins': 10}, diag_kws = {'bins': 4})
plt.show()

In [None]:
sns.violinplot(data = gm_2007, x = 'continent', y = 'lifeExp')

In [None]:
# How about creating another discrete variable
# gm_2007['65+'] = (gm_2007.lifeExp >= 65)*1
gm_2007.loc[:, '65+'] = gm_2007.lifeExp >= 65

In [None]:
# But it's okay
gm_2007

In [None]:
# Discrete vs discrete
gm_2007.groupby('continent')['65+'].apply(lambda x: x.value_counts())
df = gm_2007.groupby('continent').apply(lambda x: x['65+'].value_counts()).reset_index().rename(columns = {'level_1':'LifeExp $\geq$ 65'})
sns.barplot(data = df, x = 'continent', y = '65+', hue = 'LifeExp $\geq$ 65')

In [None]:
# Binned scatter plot (you should know this as an economist)
from scipy.stats import binned_statistic

# binned_statistic?
n = 50
bin_mean, bin_edge, bin_number = binned_statistic(np.log(gm_2007.gdpPercap), gm_2007.lifeExp, bins = n)

x = np.average([bin_edge[:-1], bin_edge[1:]], axis = 0)

plt.scatter(x, bin_mean, label = '%d bins' % n)
plt.legend()

In [None]:
# Many Plots
gm_2007['lgdpc'] = np.log(gm_2007.gdpPercap)

plt.figure(figsize = (8, 6), dpi = 160)
plt.subplot(2, 2, 1)
sns.histplot(data = gm_2007, x = 'gdpPercap')

plt.subplot(222)
sns.histplot(data = gm_2007, x = 'lifeExp', bins = 3)

plt.subplot(2, 1, 2)
sns.scatterplot(data = gm_2007, x = 'lgdpc', y = 'lifeExp')
plt.title('Gap Minder 2007')

plt.tight_layout()
plt.savefig('figure')

************
# Time Series
[TOP](#Intro-to-Python)

A quick note

In [None]:
nasdaq = pd.read_csv('nasdaq.csv', index_col = 'Date')
nasdaq.index = pd.to_datetime(nasdaq.index)
nasdaq.head()

In [None]:
# Double [[]] to keep data frame
ts = nasdaq[['Close']].rename(columns = {'Close': 'y'})
ts['ly'] = np.log(ts['y'])
ts['dly'] = ts.ly.diff()
ts.tail()

In [None]:
plt.subplot(1, 3, 1)
sns.lineplot(data = ts, x = ts.index, y = 'y')

plt.subplot(1, 3, 2)
sns.lineplot(data = ts, x = 'Date', y = 'ly')

plt.subplot(1, 3, 3)
sns.lineplot(data = ts, x = 'Date', y = 'dly')

plt.tight_layout()

In [None]:
# We want dly b/c about predicting noise structure

# create lags
ts['dly_lag1'] = ts.dly.shift(-1)
ts.head(3)

In [None]:
# To convert back suppose predict next observation:
yhat = ts.dly[2]
ly = ts.ly[1]
l_yhat = ly + yhat
exp_l_yhat = np.exp(yhat + ts.ly[1])
exp_l_yhat