## Hands-on Beginning Python

### Reference

https://us.pycon.org/2020/schedule/presentation/53/
    
https://github.com/mattharrison/Tiny-Python-3.8-Notebook

### Data

https://github.com/COVID19Tracking/covid-tracking-data
    

#### Imports

https://www.python.org/dev/peps/pep-0008/#imports


In [1]:
# First come standard libraries, in alphabetical order
import csv
import sys
import urllib.request as req

# After a blank line, import third-party libraries
import matplotlib.pyplot as plt
import pandas as pd

# After another blank line, import local libraries

In [2]:
# python version
sys.version

'3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 07:56:27) \n[Clang 9.0.1 ]'

In [3]:
! which python

/Users/jwatt/conda/envs/tiny_python_38_b/bin/python


In [4]:
pd.__version__

'1.0.3'

#### Running python

* REPL
* ipython
* bpython
<br>
* jupyter notebook

#### Learning should be
* D - Decide
* R - Relax
* M - Motivation
* O - Observe
* M - Mechanics

### Environments

* Refer to `python_envs.rst`
https://github.com/pyladies-houston/2020/tree/master/may
* virtual environments are not to be shared, so try to create on your own for your specific needs.
* You might want to stop here and to do so.
* Discussion of environemnts will limit to 15 min.

#### IDLE - Integrated Development and Learning Environment

https://docs.python.org/3/library/idle.html

* Fetch URL

In [5]:
url = 'https://raw.githubusercontent.com/COVID19Tracking/covid-tracking-data/master/data/states_daily_4pm_et.csv'

In [6]:
req

<module 'urllib.request' from '/Users/jwatt/conda/envs/tiny_python_38_b/lib/python3.8/urllib/request.py'>

In [7]:
fn = req.urlopen(url)

In [8]:
data = fn.read()

In [9]:
len(data)

553770

In [10]:
# byte string
data[:100]  # slice

b'date,state,positive,negative,pending,hospitalizedCurrently,hospitalizedCumulative,inIcuCurrently,inI'

* Write

In [11]:
# important! 
# output to csv_file, input for read_csv function later

fname = '../data/covid.csv'

In [12]:
# 1

fo = open(fname, mode='wb')

In [13]:
fo.write(data)

553770

In [14]:
fo.close()

* Context manager

In [15]:
# 2

with open(fname, mode='wb') as fo:
    fo.write(data)
# upon unindent file is closed

In [16]:
def fetch_url(url, fname):
    """
    Save a url to a local file.
    """
    fn = req.urlopen(url)
    data = fn.read()
    with open(fname, mode='wb') as fo:
        fo.write(data)

In [17]:
dir()

['In',
 'Out',
 '_',
 '_10',
 '_13',
 '_2',
 '_4',
 '_6',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_exit_code',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'csv',
 'data',
 'exit',
 'fetch_url',
 'fn',
 'fname',
 'fo',
 'get_ipython',
 'pd',
 'plt',
 'quit',
 'req',
 'sys',
 'url']

In [18]:
fetch_url(url, 'out/test.csv')

* Playing with data

In [19]:
fn2 = open(fname, encoding='utf8')

In [20]:
lines = []  # empty list literal (special syntax)
for line in fn2:
    lines.append(line)

In [21]:
len(lines)

3658

In [22]:
lines[0]

'date,state,positive,negative,pending,hospitalizedCurrently,hospitalizedCumulative,inIcuCurrently,inIcuCumulative,onVentilatorCurrently,onVentilatorCumulative,recovered,dataQualityGrade,lastUpdateEt,hash,dateChecked,death,hospitalized,total,totalTestResults,posNeg,fips,deathIncrease,hospitalizedIncrease,negativeIncrease,positiveIncrease,totalTestResultsIncrease\n'

In [23]:
lines[-1]

'20200122,WA,1,,,,,,,,,,,,c6e052d394e31103a4c7fd6f1c408b9e0008a455,2020-01-22T21:00:00Z,,,1,1,1,53,,,,,'

In [24]:
lines[:3]

['date,state,positive,negative,pending,hospitalizedCurrently,hospitalizedCumulative,inIcuCurrently,inIcuCumulative,onVentilatorCurrently,onVentilatorCumulative,recovered,dataQualityGrade,lastUpdateEt,hash,dateChecked,death,hospitalized,total,totalTestResults,posNeg,fips,deathIncrease,hospitalizedIncrease,negativeIncrease,positiveIncrease,totalTestResultsIncrease\n',
 '20200509,AK,378,26071,,8,,,,,,318,C,5/9/2020 00:00,81730ed56f3a546adb227bf046219933797f9cbb,2020-05-09T20:00:00Z,10,,26449,26449,26449,02,0,0,975,1,976\n',
 '20200509,AL,9567,115927,,,1228,,459,,272,,B,5/9/2020 00:00,7c1a5067467c6ce495a787917ddccae550e81335,2020-05-09T20:00:00Z,388,1228,125494,125494,125494,01,13,21,5034,346,5380\n']

In [25]:
type(lines)

list

In [26]:
type(lines[0])

str

In [27]:
dir(lines)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [28]:
'20200505,AL,8285,98481,,,1107,,428,,255,,B,5/5/2020 00:00,cdeeecd2210217b93fc3b08765445de51e2cebcc,2020-05-05T20:00:00Z,313,1107,106766,106766,106766,01,17,43,3389,260,3649\n' in lines

True

* Help

In [29]:
help(lines.append)

Help on built-in function append:

append(object, /) method of builtins.list instance
    Append object to the end of the list.



* python buitin functions

https://docs.python.org/3/library/functions.html

* Continue playing with data

In [30]:
lines[1]

'20200509,AK,378,26071,,8,,,,,,318,C,5/9/2020 00:00,81730ed56f3a546adb227bf046219933797f9cbb,2020-05-09T20:00:00Z,10,,26449,26449,26449,02,0,0,975,1,976\n'

### Problem 1

In [31]:
dir(lines[1])

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [32]:
lines[1].split(',')

['20200509',
 'AK',
 '378',
 '26071',
 '',
 '8',
 '',
 '',
 '',
 '',
 '',
 '318',
 'C',
 '5/9/2020 00:00',
 '81730ed56f3a546adb227bf046219933797f9cbb',
 '2020-05-09T20:00:00Z',
 '10',
 '',
 '26449',
 '26449',
 '26449',
 '02',
 '0',
 '0',
 '975',
 '1',
 '976\n']

In [33]:
# remove new line
lines[1].strip().split(',')

['20200509',
 'AK',
 '378',
 '26071',
 '',
 '8',
 '',
 '',
 '',
 '',
 '',
 '318',
 'C',
 '5/9/2020 00:00',
 '81730ed56f3a546adb227bf046219933797f9cbb',
 '2020-05-09T20:00:00Z',
 '10',
 '',
 '26449',
 '26449',
 '26449',
 '02',
 '0',
 '0',
 '975',
 '1',
 '976']

* Dictionary

https://docs.python.org/3/library/stdtypes.html#dict

In [34]:
d = {}
d['cat'] = "furry feline"
d['dog'] = "cozy canine"

In [35]:
d

{'cat': 'furry feline', 'dog': 'cozy canine'}

In [36]:
dir(d)

['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [37]:
d.keys()

dict_keys(['cat', 'dog'])

In [38]:
d.values()

dict_values(['furry feline', 'cozy canine'])

In [39]:
d.items()

dict_items([('cat', 'furry feline'), ('dog', 'cozy canine')])

* Store data into a dictioary

In [40]:
header = lines[0].strip().split(',')
line1 = lines[1].strip().split(',')

In [41]:
header

['date',
 'state',
 'positive',
 'negative',
 'pending',
 'hospitalizedCurrently',
 'hospitalizedCumulative',
 'inIcuCurrently',
 'inIcuCumulative',
 'onVentilatorCurrently',
 'onVentilatorCumulative',
 'recovered',
 'dataQualityGrade',
 'lastUpdateEt',
 'hash',
 'dateChecked',
 'death',
 'hospitalized',
 'total',
 'totalTestResults',
 'posNeg',
 'fips',
 'deathIncrease',
 'hospitalizedIncrease',
 'negativeIncrease',
 'positiveIncrease',
 'totalTestResultsIncrease']

In [42]:
line1

['20200509',
 'AK',
 '378',
 '26071',
 '',
 '8',
 '',
 '',
 '',
 '',
 '',
 '318',
 'C',
 '5/9/2020 00:00',
 '81730ed56f3a546adb227bf046219933797f9cbb',
 '2020-05-09T20:00:00Z',
 '10',
 '',
 '26449',
 '26449',
 '26449',
 '02',
 '0',
 '0',
 '975',
 '1',
 '976']

* zip

https://docs.python.org/3/library/functions.html#zip

In [43]:
zip(header, line1)

<zip at 0x11929a540>

In [44]:
list(zip(header, line1))

[('date', '20200509'),
 ('state', 'AK'),
 ('positive', '378'),
 ('negative', '26071'),
 ('pending', ''),
 ('hospitalizedCurrently', '8'),
 ('hospitalizedCumulative', ''),
 ('inIcuCurrently', ''),
 ('inIcuCumulative', ''),
 ('onVentilatorCurrently', ''),
 ('onVentilatorCumulative', ''),
 ('recovered', '318'),
 ('dataQualityGrade', 'C'),
 ('lastUpdateEt', '5/9/2020 00:00'),
 ('hash', '81730ed56f3a546adb227bf046219933797f9cbb'),
 ('dateChecked', '2020-05-09T20:00:00Z'),
 ('death', '10'),
 ('hospitalized', ''),
 ('total', '26449'),
 ('totalTestResults', '26449'),
 ('posNeg', '26449'),
 ('fips', '02'),
 ('deathIncrease', '0'),
 ('hospitalizedIncrease', '0'),
 ('negativeIncrease', '975'),
 ('positiveIncrease', '1'),
 ('totalTestResultsIncrease', '976')]

In [45]:
dict(zip(header, line1))

{'date': '20200509',
 'state': 'AK',
 'positive': '378',
 'negative': '26071',
 'pending': '',
 'hospitalizedCurrently': '8',
 'hospitalizedCumulative': '',
 'inIcuCurrently': '',
 'inIcuCumulative': '',
 'onVentilatorCurrently': '',
 'onVentilatorCumulative': '',
 'recovered': '318',
 'dataQualityGrade': 'C',
 'lastUpdateEt': '5/9/2020 00:00',
 'hash': '81730ed56f3a546adb227bf046219933797f9cbb',
 'dateChecked': '2020-05-09T20:00:00Z',
 'death': '10',
 'hospitalized': '',
 'total': '26449',
 'totalTestResults': '26449',
 'posNeg': '26449',
 'fips': '02',
 'deathIncrease': '0',
 'hospitalizedIncrease': '0',
 'negativeIncrease': '975',
 'positiveIncrease': '1',
 'totalTestResultsIncrease': '976'}

* enumerate

https://docs.python.org/3/library/functions.html#enumerate

In [46]:
lst = list(d)
lst

['cat', 'dog']

In [47]:
for i, item in enumerate(lst, 0):
    print(i, item)

0 cat
1 dog


* Read CSV

https://docs.python.org/3/library/csv.html?highlight=csv

In [48]:
# 1

def read_csv(fname):
    with open(fname, encoding='utf8') as csvfile:
        rows = []
        for line in csvfile:
            values = line.strip().split(',')
            if len(rows) == 0:  # if not rows
                headers = values
            else:
                rows.append(dict(zip(headers, values)))  # buggy
        return rows

In [49]:
read_csv(fname)[:2]

[]

### Problem 2

* rubber duck debugging...

In [50]:
# 2

def read_csv(fname):
    # import pdb; pdb.set_trace()
    with open(fname, encoding='utf8') as csvfile:
        rows = []
        for i, line in enumerate(csvfile):
            values = line.strip().split(',')
            if i == 0:
                headers = values
            else:
                rows.append(dict(zip(headers, values)))
        return rows

In [51]:
read_csv(fname)[:2]

[{'date': '20200509',
  'state': 'AK',
  'positive': '378',
  'negative': '26071',
  'pending': '',
  'hospitalizedCurrently': '8',
  'hospitalizedCumulative': '',
  'inIcuCurrently': '',
  'inIcuCumulative': '',
  'onVentilatorCurrently': '',
  'onVentilatorCumulative': '',
  'recovered': '318',
  'dataQualityGrade': 'C',
  'lastUpdateEt': '5/9/2020 00:00',
  'hash': '81730ed56f3a546adb227bf046219933797f9cbb',
  'dateChecked': '2020-05-09T20:00:00Z',
  'death': '10',
  'hospitalized': '',
  'total': '26449',
  'totalTestResults': '26449',
  'posNeg': '26449',
  'fips': '02',
  'deathIncrease': '0',
  'hospitalizedIncrease': '0',
  'negativeIncrease': '975',
  'positiveIncrease': '1',
  'totalTestResultsIncrease': '976'},
 {'date': '20200509',
  'state': 'AL',
  'positive': '9567',
  'negative': '115927',
  'pending': '',
  'hospitalizedCurrently': '',
  'hospitalizedCumulative': '1228',
  'inIcuCurrently': '',
  'inIcuCumulative': '459',
  'onVentilatorCurrently': '',
  'onVentilatorC

* pdb/ breakpoint

https://docs.python.org/3/library/pdb.html?highlight=pdb#module-pdb

* Filter

In [52]:
res = read_csv(fname)
len(res)

3657

In [53]:
tx_res = []
for row in res:
    if row['state'] == 'TX':
        tx_res.append(row)

In [54]:
len(tx_res)

67

In [55]:
tx_res[0]

{'date': '20200509',
 'state': 'TX',
 'positive': '37860',
 'negative': '451434',
 'pending': '',
 'hospitalizedCurrently': '1735',
 'hospitalizedCumulative': '',
 'inIcuCurrently': '',
 'inIcuCumulative': '',
 'onVentilatorCurrently': '',
 'onVentilatorCumulative': '',
 'recovered': '20141',
 'dataQualityGrade': 'B',
 'lastUpdateEt': '5/9/2020 13:15',
 'hash': 'deeecea989a19b1e2c2f50a4fa57cec56a0ec1c2',
 'dateChecked': '2020-05-09T20:00:00Z',
 'death': '1049',
 'hospitalized': '',
 'total': '489294',
 'totalTestResults': '489294',
 'posNeg': '489294',
 'fips': '48',
 'deathIncrease': '45',
 'hospitalizedIncrease': '0',
 'negativeIncrease': '10925',
 'positiveIncrease': '1251',
 'totalTestResultsIncrease': '12176'}

In [56]:
tx_res[0]['positive']

'37860'

In [57]:
type(tx_res[0]['positive'])

str

In [58]:
# 3

def read_csv(fname):
    #import pdb; pdb.set_trace()
    with open(fname, encoding='utf8') as csvfile:
        rows = []
        for i, line in enumerate(csvfile):
            values = line.strip().split(',')
            if i == 0:
                headers = values
            else:
                for j, val in enumerate(values):
                    val = int(val)
                    values[j] = val
                rows.append(dict(zip(headers, values)))
    return rows

### Problem 3

In [59]:
read_csv(fname)[:2]

ValueError: invalid literal for int() with base 10: 'AK'

In [None]:
# 4 --> working!

def read_csv(fname):
    #import pdb; pdb.set_trace()
    with open(fname, encoding='utf8') as csvfile:
        rows = []
        for i, line in enumerate(csvfile):
            values = line.strip().split(',')
            if i == 0:
                headers = values
            else:
                for j, val in enumerate(values):
                    try:
                        val = int(val)
                    except ValueError:
                        pass
                    else:
                        values[j] = val
                rows.append(dict(zip(headers, values)))
    return rows

In [None]:
read_csv(fname)[:2]

In [None]:
# Pause and make sure this function is working...

In [None]:
res = read_csv(fname)
len(res)

* Filter

In [None]:
# 1

def filter(rows, state):
    res = []
    for row in res:
        if row['state'] == state:
            res.append(row)
    return res

In [None]:
res_tx = filter(res, 'TX')
len(res_tx)

### Problem 4

More problems... get used to Errors...

In [None]:
# 2

def filter(rows, state):
    res = []
    for row in rows:
        if row['state'] == state:
            res.append(row)
    return res

In [None]:
res_tx = filter(res, 'TX')
len(res_tx)

In [None]:
res_tx[0]

In [None]:
# 3

def filter(rows, state):
    res = [row for row in rows if row['state'] == state]
    return res

In [None]:
res_tx = filter(res, 'TX')
len(res_tx)

In [None]:
pos = [row['positive'] for row in res_tx]
len(pos)

In [None]:
dir(pos)

In [None]:
pos.reverse()
pos

In [None]:
def get_date(row):
    return row['date']

* Sort

In [None]:
tx_sorted = sorted(res_tx, key=get_date)

In [None]:
len(tx_sorted)

In [None]:
tx_sorted[0]

In [None]:
tx_sorted[-1]

In [None]:
# 1

def sortby(rows, col_name):
    def get_col_name(row):
        return row[col_name]
    return sorted(rows, key=col_name)

In [None]:
res = read_csv(fname)
tx_res = filter(res, 'TX')
tx_res = sortby(tx_res, 'date')

### Problem 5

In [None]:
# 2

def sortby(rows, col_name):
    def get_col_name(row):
        return row[col_name]
    return sorted(rows, key=get_col_name)

In [None]:
res = read_csv(fname)
tx_res = filter(res, 'TX')
tx_res = sortby(tx_res, 'date')

In [None]:
len(tx_res)

In [None]:
tx_res[0]

In [None]:
# Make sure all are working...

### Exercise: 

Use Python csv module

https://docs.python.org/3/library/csv.html?highlight=csv

In [None]:
# csv.reader

def csv_reader(fname):
    pass

In [None]:
# csv_reader(fname)[:2]

* Pandas

https://pandas.pydata.org/

In [None]:
df = pd.read_csv(fname, sep=',')

In [None]:
type(df)

In [None]:
len(df)

In [None]:
df.head(3)

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

df.to_dict('records')[:2]

In [None]:
# Make sure save the REPL...

* Plotting

https://matplotlib.org/index.html

https://matplotlib.org/gallery/lines_bars_and_markers/simple_plot.html#sphx-glr-gallery-lines-bars-and-markers-simple-plot-py

* Install matplotlib

https://pypi.org/project/matplotlib/

In [None]:
dir()

In [None]:
# dir(plt)

In [None]:
# 1

def get_value(rows, col_name):
    res = []
    for row in rows:
        res.append(row[col_name])
    return res

In [None]:
pos = get_value(res, 'positive')
len(pos)

In [None]:
# 2

def get_value(rows, col_name):
    res = [row[col_name] for row in rows]
    return res

In [None]:
tx_res[0].keys()

In [None]:
pos = get_value(res, 'positive')
len(pos)

In [None]:
# data for plotting
# x_axis = tx_res
# y_axix = col 'pisitive'

fig, ax = plt.subplots()
ax.plot(get_value(tx_res, 'positive'))
# plt.show()

In [None]:
fig, ax = plt.subplots()
ax.plot(get_value(tx_res, 'death'))
# plt.show()

In [None]:
fig, ax = plt.subplots()
ax.plot(get_value(tx_res, 'hospitalized'))
# plt.show()

In [None]:
# all three plots
fig, ax = plt.subplots()
ax.plot(get_value(tx_res, 'positive'))
ax.plot(get_value(tx_res, 'death'))
ax.plot(get_value(tx_res, 'hospitalized'))
# plt.show()

In [None]:
fig.savefig('out/texas.png')

In [None]:
ls out/

### Solutions

* Exercise: use Python csv module

In [None]:
def csv_reader(fname):
    with open(fname, 'r') as csvfile:
        csv_reader = csv.reader(csvfile)
        # print(csv_reader)
        rows = []
        for i, line in enumerate(csv_reader):
            values = line  # csv.reader takes care of new line
            if i == 0:
                headers = line
            else:
                for j, val in enumerate(values):
                    try:
                        val = int(val)
                    except ValueError:
                        pass
                    else:
                        values[j] = val
                rows.append(dict(zip(headers, values)))
        return rows

In [None]:
csv_reader(fname)[:2]