# If you're writing a lot of code - you're doing it wrong


### Ryan Kazmerik
* Data Scientist, Encana Corporation
* Mount Royal University, Bachelor CIS (2011)
* Wilfrid Laurier University, Master MAC (2019)

## Let's start with our first data representation: Comma seperated values, and use the built in Python library CSV to read the contents of the file:

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import csv

with open('articles.csv',  encoding="utf8") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    
    for row in csv_reader:
        print(", ".join(row), end='\n\n')


## CSV is a great storage format, compact, and readable - but a little clumsy to work with.

## Let's convert this CSV into another data structure: List

In [None]:
articles_list = []

with open('articles.csv',  encoding="utf8") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    
    for row in csv_reader:
        articles_list.append(row)
        
print("Total number of articles:", len(articles_list)-1)
print("Total number of columns:", len(articles_list[0]), end='\n\n')

print("See the 50th article:")
print(articles_list[50])
print()
print()

print("Print the first 10 titles:")
print()

titles = [a[4] for a in articles_list[2:10]]

for t in titles:
    print('   ',t, end='\n\n')

## With a list, we can easily get some basic stats on the articles, iterate through the items and build custom ranges.

## But if we want to add a new property (ex. Sentiment) lists can be difficult to work with.. so it's best to convert our list items into objects a.k.a JSON

## We'll use the built in Python library 'json' for this task

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
NLTK = SIA()

import json

articles_json = [a for a in csv.DictReader(open('articles.csv', encoding="utf8"))]

for article in articles_json:
    
    sentiment = NLTK.polarity_scores(article['description'])
    
    article.update({'sentiment': sentiment['compound']})
    
print(json.dumps(articles_json, indent=2))


## Now that we have a sentiment score for each article, let's produce some aggregations - what if we wanted to see the average sentiment per day?

## We'll use a popular library called Pandas for this, and a data representation: Data Frame

In [None]:
import pandas as pd
from pandas.io.json import json_normalize

df = pd.DataFrame.from_dict(json_normalize(articles_json), orient='columns')

df['day'] = df['publishedAt'].str.split('T').str[0]
df = df.groupby(['day']).agg({'sentiment':"mean",'description': "count"})
df.columns = ["avg_sentiment", "doc_count"]

print(df)

aggs = df.reset_index().to_dict(orient='index')

## Now let's compare it with the stock price of one of the top solar energy producing companies in North America : Vivint Solar

## We can use the 'request' library to make an API call to a stock feed service

In [None]:
import requests

api = 'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VSLR&apikey=KUTLFACJXW9LIKLO'

stocks = json.loads(requests.get(api).text)

print(json.dumps(stocks, indent=2))


## Let's add this data to our Data Frame of articles and compare the sentiment and stock price for the last 100 days

In [None]:
prices = stocks["Time Series (Daily)"]

i = 0;
for k,v in prices.items():
    
    articles_json[i].update({"price": v['4. close']})
    i+=1;
    
df2 = pd.DataFrame.from_dict(json_normalize(articles_json), orient='columns')
    
print(df2.head())
    

## This could be an interesting dataset, but it would help to visualize the data to identify potential trends and correlation

## But that's a whole other lecture...
<br/>

**To recap this notebook, we used the following data representations:**
* CSV
* JSON
* DataFrame

**And the following libraries:**
* csv (python)
* json (python)
* requests (python)
* vader (nltk)
* pandas



## Happy coding!