# DSCI 511: Data acquisition and pre-processing<br>Chapter 1: Introduction, process, and getting started with data
## Exercises
Note: numberings refer to the main notes.

#### 1.4.2.3 Exercise: Loading data from a JSON file
Load the example JSON file (`colors.json`) containing some data about colors. Use Python's `json` module to load the file ([source](https://github.com/corysimmons/colors.json/blob/master/colors.json)), and use the name of a color, e.g., `name = 'magenta'` as a _key_ to the dictionary (more on these in Chapter 2) result, e.g., `result[name]` to see the RGB intensities that mix to make the color.

In [2]:
## code here
import json
with open('./data/colors.json','r') as fh:
    colors=json.load(fh)
print(colors['magenta'])

[255, 0, 255, 1]


#### 1.4.2.5 Exercise: XML
Load an [example XML file](https://gist.github.com/sghael/2930380) of colors (`colors.xml`) and extract a single color of your choosing from the result.

In [16]:
## code here
import xmltodict

with open('./data/colors.xml','rb') as fh:
    xml_data = fh.read()
dict_xml = xmltodict.parse(xml_data)
# print(dict_xml['resources']['color'])
colors = {i['@name']:i['#text'] for i in dict_xml['resources']['color'] }
### you can do this without list comprehension as well as below
# for i in dict_xml['resources']['color']:
#     colors[i['@name']] = i['#text']
print(colors['magenta'])

#FF00FF


#### 1.4.2.7 Exercise: load a csv with a header
Use the `csv.reader()` function to the load the `APPL.csv` stock prices spreadsheet. Print the first 10 rows and infer what the columns mean from the header. Note: Stock history for APPL were retrieved from [Yahoo](https://finance.yahoo.com/quote/AAPL/history?p=AAPL). How would you utilize the header information separately in a data structure to ease access to columns?

In [20]:
## code here
import csv
# the reader function can take a "delimiter" argument for files that are separated by other characters instead of a comma.
# this particular CSV file doesn't have a header, but some files might. We'd need the "header" argument for reading those files.
APPL_reader = csv.reader(open("data/APPL.csv", "r")) 
APPL = list(APPL_reader)

# print the first ten colors
print(APPL[:10])


[['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], ['2016-07-13', '97.410004', '97.669998', '96.839996', '96.870003', '25892200', '95.442123'], ['2016-07-12', '97.169998', '97.699997', '97.120003', '97.419998', '24167500', '95.984011'], ['2016-07-11', '96.75', '97.650002', '96.730003', '96.980003', '23794900', '95.550502'], ['2016-07-08', '96.489998', '96.889999', '96.050003', '96.68', '28912100', '95.254921'], ['2016-07-07', '95.699997', '96.50', '95.620003', '95.940002', '25139600', '94.525831'], ['2016-07-06', '94.599998', '95.660004', '94.370003', '95.529999', '30949100', '94.12187'], ['2016-07-05', '95.389999', '95.400002', '94.459999', '94.989998', '27705200', '93.589829'], ['2016-07-01', '95.489998', '96.470001', '95.330002', '95.889999', '26026500', '94.476565'], ['2016-06-30', '94.440002', '95.769997', '94.300003', '95.599998', '35836400', '94.190838']]


## Additional In-depth Exercises
### A. Set up Google Collaboratory on your browser and connect Google Drive


1. Create a Gmail/Google account
2. Log into Google Drive
3. Create a Directory for your Collaboratory (Jupyter) notebooks
4. Navigate the drop-down menu: new > more > connect more apps
5. Search of Collaboratory app and Google-authorize its installation (make Colab the default app for files it can open)
6. To create a new notebook, navigate the drop-down menu: new>more>Google Collaboratory
7. To upload/open an existing notebook, navigate the drop-down menu:  
  - new > File Upload or 
  - new > Folder Upload


### B. Set up Google Collaboratory on you browser and connect Google Drive

1. To work with data, you'll have to have a copy in your own drive:
  ```
  from google.colab import drive
  drive.mount('/content/gdrive')
  ```
2. when executing for the first time, follow google's dialogue to connect the Google Drive File Stream (don't forget to copy the provided code into the colab notebook where you're working)
3. load your data, read from: `/content/gdrive/My Drive/path/to/data`

### C. Find some data and load it into your Google Drive/Colab