# Python Review

What we expect you to know already. If this is all new, ask for help during exercises.

## Data types
- **int** (1, 10, -25)
- **float** (0.1, 3.14)
- **string** ("Hello, World!")
- **bool** (True, False)

## Data structures

- **list** An ordered set of elements
- **dictionary** An unordered key / value store. Each value can be accessed indexing into the key, e.g. `python_dict['a']`

## Operators

In [None]:
1 + 1  # 2

12 / 4  # 3.0

1 == 2  # False

10 != 0  # True

not False  # True

True and False  # False

False or True  # True

## Variables

In [None]:
x = 5

x + 10

x = 0

## Control Statements

Remember, white space is necessary in python.

In [None]:
x = 20

if x == 3:
    print('x is 3!')
elif x > 0:
    print('x is greater than 0!')
else:
    print('x is something else!')
    
for number in [1, 2, 3, 4, 5]:
    print(number)
    
my_dict = {'key1':'val1', 'key2':'val2'}
    
for key in my_dict:
    print(key)
    print(my_dict[key])
    
while x > 0:
    print(x)
    x = x - 1

## Exercise 1

In [None]:
#If you are stuck, google or ask for help!

python_list = [1,2,3,7]
python_dict = {'a':3, 'd':5}

# Print each number in the list

# INSERT CODE HERE

# Print each value in the dictionary

# INSERT CODE HERE

# JSON

A way to store data (lists, dictionaries, etc) in a string or file format. An example and exercise is below.

In [None]:
import json

json_example = '''
{
    "created_at": "Wed Nov 06 23:58:55 +0000 2019",
    "text": "Kleenex® Brand Underscores Importance of #Skin Care This Cold &amp; Flu Season  #skincare #beautytips  https://t.co/PWqB6MmQVw",
    "is_quote_status": false,
    "quote_count": 0,
    "reply_count": 0,
    "retweet_count": 3,
    "favorite_count": 5,
    "entities": {
        "hashtags": [
            {
                "text": "Skin",
                "indices": [41,46]
            },
            {
                "text": "skincare",
                "indices": [80,89]
            },
            {
                "text": "beautytips",
                "indices": [90,101]
            }
        ]
    },
    "favorited": false,
    "retweeted": false
}
'''

## Exercise 2

In [None]:
# Load the JSON example above (don't forget to run the cell) using
# the json.loads() function
# Remember, this will turn the json into a python dictionary

# INSERT CODE HERE

# Print how many times the tweet was retweeted ("retweet_count")

# INSERT CODE HERE

# Print the hashtags in the tweet

# INSERT CODE HERE

# Don't repeat yourself

A common programming tip is to not repeat code. If you find yourself copying and pasting the same code over and over, it's best to make a function. This way you only change it in one place, when you need to change it. Below is an example of a function.

In [None]:
# This function adds one to the value that was passed in and prints it
def add_one(a):
    b = a + 1
    return b

## Exercise 3

Write a function that takes the example JSON string above and returns the retweet count

In [None]:
# Write your function here

# Saving and Loading JSON to a file

Many times it's not convenient to keep on hitting a URL for a JSON, maybe the website is too slow or there is a limit on how many times a user can access it. In these cases, it's best to save the JSON to a file and read it. Below is an example of writing to a file.

In [None]:
with open('myfile.txt', 'w') as test_file:
    test_file.write('I am a text file!')

## Exercise 4

In [None]:
# Save the JSON to a file named 'example.json'

# INSERT CODE HERE

# Load the JSON back from 'example.json' (using json.load)

# INSERT CODE HERE

# Accessing the Twitter API

We are going to access www.twitter.com and use the search feature. Here is an example of what we will do with, but with code.


## Package Requirements

We need to install the following packages into your Python environment

In [None]:
# Run this cell to install the needed packages
import sys
!{sys.executable} -m pip install requests requests_oauthlib stylecloud pandas matplotlib

## What is an API?

An API (Application Programmer Interface) is a way a website/app allows a user to programmatically access data.

## Other ways of accessing data

1. Copy and paste data into a spreadsheet (or plain text file) and read with Python package `csv` or `pandas` (or if it's a text file, then loop through the lines) 

1. Scrape/parse the HTML with Python package *Beautiful Soup* (or equivalent)

## API vs. Other Methods

An API is more organized and the way the app wants you to access its data. However many websites or apps don't have an API, or it is too poorly designed. In those cases, it's better to use a more crude approach.

# API Authentication

We need to get the tokens from the Twitter development website, but first there are some steps involved and some forms to fill out.

## Exercise 5

1. Go to https://developer.twitter.com/en/apply/user.html

1. Login to your Twitter account or create a new user

1. Fill out the forms and select you are applying for educational use. It doesn't really matter what you write down as long as you meet the word length. Go through the entire process

1. After that is complete, click on your to your name at the top right corner, next to *Dashboard* (not *Dashboard* but next to it) and select *Apps*, then click on *Keys and Tokens*

1. Write down the four keys below

In [None]:
# Write down your api keys here
API_KEY = ''
API_SECRET = ''
ACCESS_TOKEN = ''
ACCESS_TOKEN_SECRET = ''

## Exercise 6

Now go here to create a new dev environment https://developer.twitter.com/en/account/environments and select *Full Archive* (Search API). Write down the name of your environment below.

In [None]:
# Write the environment name below
DEV_ENVIRONMENT_NAME = ''


# Twitter Search API

Now we are ready to use _requests_ to access the api. _requests_ is a Python package used to access a URL and get the website data back.

In [None]:
import requests
from requests_oauthlib import OAuth1

SEARCH_API_URL = 'https://api.twitter.com/1.1/tweets/search/30day/{}.json'.format(DEV_ENVIRONMENT_NAME)

# Authentication uses OAuth1 as referenced in Twitter documentation
auth = OAuth1(API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

We are now ready to send a search query to twitter and get back the JSON response

## Exercise 7

Use the documentation here https://developer.twitter.com/en/docs/tweets/search/quick-start/premium-full-archive to figure out how to format your JSON data that you are sending to twitter. The data will be a python dictionary that the _requests_ library will convert to a JSON for you when sending the data.

In [None]:
query_data = {
    # INSERT CODE HERE
}


response = requests.post(SEARCH_API_URL, auth=auth, json=query_data)

## Exercise 8

Now that we sent the request, we need to figure out how to get the result back into a python data type so we can read it. Use

In [None]:
# Get the tweets from the requests response object
# See documentation at this link https://www.w3schools.com/python/ref_requests_response.asp
tweets = # INSERT CODE HERE
print(len(tweets['results']))

# The first result in the
tweets['results'][0]

# Query Operators

Different operators can be applied to the query to for example filter the search results by location. All operators are listed here https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/premium-operators

For example if we wanted to search for _flu_ in *Chicago* we would have this code `"query":"flu place:chicago"` in our JSON

## Exercise 9

Make another search now but now use one of the operators linked above to have a more specific result.

In [None]:
# INSERT CODE HERE

## Exercise 10

Collect all the tweets into one python string, use the sample code below

In [None]:
corpus = ''
for tweet in tweets['results']:
    corpus = #INSERT CODE HERE

corpus

## Exercise 11

We are going to convert the tweets into text a few times so we can process the next from different searches. Since we don't want to repeat ourself, write a function below that takes the query JSON and returns the text from all the tweets.

In [None]:
def get_text_from_search(query_data):
    '''INSERT CODE BELOW'''


# Word Cloud

Now that we have some tweet results, we are going to collect all the text into one python string so we can make a word cloud

There are few python libraries that convert text to a word cloud. This one allows us to specify any shape from https://fontawesome.com/ so we will use the twitter logo

In [None]:
import stylecloud
from IPython.display import Image
import os

filename = 'wordcloud.png'

stylecloud.gen_stylecloud(text=corpus, icon_name='fab fa-twitter', output_name=filename)
Image(filename=filename)

## Exercise 12

Many times the sites we are interested in analyzing data from, do not have apis. Can you generate the word cloud using a more crude method? (Hint: One way is to copy and paste the text from twitter into a string and then pass that into the word cloud.)

In [None]:
# INSERT CODE HERE

# Data Cleaning and Analysis

Now we will do some cleaning to the text. We didn't have to before because the wordlcoud package did that for you. However, now since we are doing a more specific analysis, we will have to clean the data.

## Exercise 12

Convert the text we have collected into lowercase

In [None]:
corpus_lower = corpus # replace with your CODE HERE

We will replace newline characters with spaces and remove non alphabet characters.

In [None]:
import re
corpus_no_chars = ' '.join(corpus_lower.splitlines())
corpus_no_chars = re.sub(r'[^a-z ]', '', corpus_no_chars)
corpus_no_chars

## Bar Plot

We will now make a bar plot of the word occurrances in the tweets, but first we need to create a function to count the word occurances.

## Exercise 13

Make a function that given the tweet text and a word, it cleans the text (using the code above) and counts the occurance of that word.

Use `str.count` to count the words. For example `'hello world'.count('hello')` will return `1`.

In [None]:
def clean_and_count(tweet_text, word):
    # INSERT CODE HERE


# Pandas

Pandas is a powerful python package for reading csv files, storing csv files, and analyzing or changing the data in them.

In addition from reading in a Dataframe from a csv file, we can also make our own using a dictionary like the following example.

In [None]:
import pandas

data = {'place':['chicago', 'detoit'], 'trump':[3,4], 'hilary':[5,3]}
df = pd.DataFrame(data)
df

## Exercise 14

Play around with the pandas syntax, look at this guide https://jalammar.github.io/gentle-visual-intro-to-data-analysis-python-pandas/

How do you print only the `place` column? How do you save it and read from csv?

In [None]:
# INSERT CODE HERE

## Exercise 15

Construct a dict to create a pandas dataframe similar to the example above. Use `clean_and_count` and fill the code below. Don't forget to fill out the place as well.

In [None]:
data = {}
places = ['chicago', 'texas', 'alabama', 'california']
for place in places:
    tweet_text = get_text_from_search('impeachment place:' + place)
    words = ['trump', 'barr', 'stone', 'schiff', 'clinton', 'obama']
    
    # Initialize the dictionary of lists 
    for word in words:
        data[word] = []
    
    for word in words:

        # INSERT CODE HERE
        

df = pd.DataFrame(data)


# Plotting

Now we are ready to plot!

In [None]:
df.plot(kind='bar',x='place',y=words)

# Exercise 16: Occurances of All Words Dataframe

Using what we've learned so far, make a `pandas` dataframe with two columns. One a list of all the words and the second one how many that word appears. What `pandas` command will give you the top 5 words?