### Using the Request Library to Import a .csv Data File

When I wrote a basic .csv data file tool for looking at the most recent data I realized it would be nice to embed code that reads the most recent COVID data from the NY Times github and saves it for use by the COVID notebook. 

**Requests** is clearly the tool that is used by most  but in my usual bumbling way my incomplete understanding of the structure of python led to some confusions. I finally found an [example to poach](https://medium.com/towards-entrepreneurship/importing-a-csv-file-from-github-in-a-jupyter-notebook-e2c28e7e74a5) which was helpful in getting me on the right track (thanks to the Medium community).

```io``` is a tool for [parsing I/O streams](https://docs.python.org/3/library/io.html) which is what the requests library generates via the .get call. 

Once the I/O stream is converted to a string file then it is read in the normal way with pandas to create the data frame.

[Documentation for the requests library](https://2.python-requests.org/en/master/api/#requests)


In [1]:
import pandas as pd
import requests
import io

### Bruce Learning:

```requests``` is a library of tools for communicating with stuff out there on the web. Images, files, etc etc. ```request.get()``` is apparently the most typical use as well as the one I am using here. The object that is returned by the '.get' is a Response object which has the following characteristics. 

In [2]:
dir(requests.Response)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'apparent_encoding',
 'close',
 'content',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'text']

In [3]:
# Downloading the csv file from the github. This url needs to be for the raw file
# not the link to the github page hosting the file

url = "https://github.com/nytimes/covid-19-data/blob/master/us-counties-recent.csv?raw=true" # Make sure the url is the raw version of the file on GitHub
# The or
#download = requests.get(url).content
rObject = requests.get(url)

downloadContent = rObject.content
# This gives me a way to visualise the text string which is delivered with
# the requests.get() call. A continuous stream of characters with embedded
# \n (newline).
# downloadContent

### Exploring the Attributes:

This was helpful in that the apparent encoding of the text stream is 'utf-8'. Explains why the decode type is 'utf-8'.

In [4]:
# rObject.apparent_encoding
rObject.ok

True

### Decoding the request content:

```download.decode()```: decode is a python string attribute (the content stream from the 'get' is a string) that decodes an encoded string. The apparent_encoding is an attribute of the Response object and was 'utf-8' for this content. Would be good to explore this encoding attribute more later.

In [5]:
decodedContent = downloadContent.decode('utf-8')
# decodedContent

In [6]:
# Reading the downloaded content and turning it into a pandas dataframe

dataFrame = pd.read_csv(io.StringIO(decodedContent))

# Filter for Deschutes County
dataDeschutes = dataFrame.loc[(dataFrame['county'] == 'Deschutes')]

# Printing out the first 5 rows of the data set
# just to see what I have.
print (dataDeschutes.head())

             date     county   state     fips  cases  deaths
2235   2021-04-06  Deschutes  Oregon  41017.0   6504    71.0
5484   2021-04-07  Deschutes  Oregon  41017.0   6535    72.0
8733   2021-04-08  Deschutes  Oregon  41017.0   6581    72.0
11981  2021-04-09  Deschutes  Oregon  41017.0   6633    72.0
15229  2021-04-10  Deschutes  Oregon  41017.0   6706    72.0


### Write the Data Frame to a .csv File:

This works well. ```index=False``` strips the line numbers off the data frame before writing it. This is a good plan. 

The .to_csv overwrites the existing file if it already exists. Handy though scary.  Useful in this context where I want to update the COVID data for my analysis each time I run it.

In [7]:
dataDeschutes.to_csv('data/dataDeschutes.csv', sep='\t', index=False)

In [8]:
readData = pd.read_csv('data/dataDeschutes.csv', sep='\t')
print(readData.head())

         date     county   state     fips  cases  deaths
0  2021-04-06  Deschutes  Oregon  41017.0   6504    71.0
1  2021-04-07  Deschutes  Oregon  41017.0   6535    72.0
2  2021-04-08  Deschutes  Oregon  41017.0   6581    72.0
3  2021-04-09  Deschutes  Oregon  41017.0   6633    72.0
4  2021-04-10  Deschutes  Oregon  41017.0   6706    72.0
