
## Test for Reading CSV files
### PH 212 COCC
#### Bruce Emerson 7/27/20

As we have been working with the Arduino to create a data logger which records data on a microSD card. Assuming we format the file appropriately as a csv file the pandas library is delighted to read the data into what's called a data frame. 

This quick notebook is an initial test of that process.


### Dependencies

The new dependency here is the [Pandas](https://pandas.pydata.org/) library which was developed to support data science applications in python. It is classically imported with the alias pd. It is installed as part of your Anaconda package. When you update Anaconda you can also update Pandas.

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mplot
import matplotlib.pyplot as plt
from numpy.polynomial import polynomial as ply

### Read csv from Pandas

As we get deeper into python we will need to begin to develop a richer understanding of how python works. We can do this incrementally so don't panic. In Pandas there are a variety of [data structures](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) which are described in the pandas documentation. There is clearly a lot to learn be we will focus on [data frames](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) which are a data structure that matches our normal understanding of a two dimensional data set.

Here is a link to the last 30 days of data by county across the country from the [NY Times data github](https://github.com/nytimes/covid-19-data/blob/master/us-counties-recent.csv). 

In [9]:
dataSet = pd.read_csv("data/OregonDailydata.csv")
print(dataSet)

         date state  positive  negative  pending  hospitalizedCurrently  \
0    20200728    OR     17416    370240      NaN                  230.0   
1    20200727    OR     17088    365478      NaN                  237.0   
2    20200726    OR     16758    361717      NaN                  233.0   
3    20200725    OR     16492    357518      NaN                  233.0   
4    20200724    OR     16104    350463      NaN                  233.0   
..        ...   ...       ...       ...      ...                    ...   
142  20200308    OR        14       100     53.0                    NaN   
143  20200307    OR         7        77     40.0                    NaN   
144  20200306    OR         3        64     28.0                    NaN   
145  20200305    OR         3        45     13.0                    NaN   
146  20200304    OR         3        29     18.0                    NaN   

     hospitalizedCumulative  inIcuCurrently  inIcuCumulative  \
0                    1537.0        

### DataFrame Attributes

DataFrame is a python class which is to say it is a creature that has various predefined characteristics called attributes. These are created to make pulling out discrete portions of the data set easier. What I would want you to understand is that these attributes exist and you can recognize them in the code when you see and object like out dataSet (which is a DataFrame) with a .something appended to it. Here is a accessible discussion of [important attributes of the DataFrame class.](https://pythontic.com/pandas/dataframe-attributes/introduction)

The dir() command asks python to look at the code in the library that is associated with the class (in this case a DataFrame) and just list all the files in that library. This is definitely overkill but will give you a sense of what's out there. The ones we are most interested in are at the end of the list and are of the from 'name'. 

In [None]:
dir(pd.DataFrame)

### Using Attributes

Here are some examples:

dataSet.index -- tells up how many lines are in the object (notice header + 146 lines = 0 to 147) 

In [27]:
print(dataSet.index)
print(dataSet.columns)
print(dataSet.dtypes)

RangeIndex(start=0, stop=147, step=1)
Index(['date', 'state', 'positive', 'negative', 'pending',
       'hospitalizedCurrently', 'hospitalizedCumulative', 'inIcuCurrently',
       'inIcuCumulative', 'onVentilatorCurrently', 'onVentilatorCumulative',
       'recovered', 'dataQualityGrade', 'lastUpdateEt', 'dateModified',
       'checkTimeEt', 'death', 'hospitalized', 'dateChecked',
       'totalTestsViral', 'positiveTestsViral', 'negativeTestsViral',
       'positiveCasesViral', 'deathConfirmed', 'deathProbable', 'fips',
       'positiveIncrease', 'negativeIncrease', 'total', 'totalTestResults',
       'totalTestResultsIncrease', 'posNeg', 'deathIncrease',
       'hospitalizedIncrease', 'hash', 'commercialScore',
       'negativeRegularScore', 'negativeScore', 'positiveScore', 'score',
       'grade'],
      dtype='object')
date                          int64
state                        object
positive                      int64
negative                      int64
pending              

pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

In [18]:
pd.DataFrame(dataSet, columns = ['date','positive'])

Unnamed: 0,date,positive
0,20200728,17416
1,20200727,17088
2,20200726,16758
3,20200725,16492
4,20200724,16104
...,...,...
142,20200308,14
143,20200307,7
144,20200306,3
145,20200305,3


In [21]:
dir(pd.DataFrame)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__reduce__',
 '__reduce_e

In [22]:
pd.Dataframe

AttributeError: module 'pandas' has no attribute 'Dataframe_dict_'