## Python and Jupyter Notebook Tips:

This notebook is a Jupyter Notebook. You can interact with it in a few ways: 

1. You can click on the Binder version (this is hosted on a cloud server)
2. You can run on your own machine through Jupyter Labs: 
	- Download this notebook and this folder of data.
	- I encourage you to learn to create Jupyter notebooks on your own machine––this will give you a little more control over writing and saving your own Python code

###  Pro Tips:
- Running a cell in JupyterLab: Click on the cell, then click ► (the "Run" icon) in the menu at the top of this notbook 
- `Tab` completion. 
    - Like the command line, Python uses tab completion
    - Pressing the `tab` key on your on your keyboard will allow you to search for any variables that you've already defined, as well as matching functions or modules within python.
- Run cells in order!
    - Python executes code in the order that it's written. This means that some parts of code will depend on parts written earlier. If you get an error, it may mean that you simply haven't defined a variable or function. Make sure to run code in the sequence it's written.


------


## Example: Reading and analyzing texts:

Below is a chunk of Python code. These lines, when put together, do something simple yet important. They count and display the most frequent words in a text file. (If basic forms of text analysis like countint word frequency sounds familiar, this is because what we'll be learning to do with Python builds on tbe kinds of commands you learned to do with the command line.)

The example below specifically counts and displays the 40 most frequent words in Virginia Woolf's *A Room of One's Own* (1929)

In [1]:
# word-frequencies.py

# Import Libraries and Modules

import re
from collections import Counter

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text) 
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "woolf-a-room-of-ones-own.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

# Read in File

full_text = open(filepath_of_text, encoding="utf-8", errors='ignore').read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

most_frequent_meaningful_words

[('one', 316),
 ('women', 194),
 ('would', 143),
 ('woman', 124),
 ('could', 104),
 ('mind', 102),
 ('like', 98),
 ('thought', 95),
 ('men', 89),
 ('say', 79),
 ('must', 78),
 ('even', 77),
 ('upon', 73),
 ('life', 65),
 ('said', 65),
 ('may', 64),
 ('perhaps', 63),
 ('great', 62),
 ('write', 60),
 ('two', 57),
 ('sex', 57),
 ('without', 56),
 ('think', 54),
 ('books', 53),
 ('might', 52),
 ('time', 51),
 ('shakespeare', 51),
 ('never', 46),
 ('man', 46),
 ('people', 46),
 ('book', 44),
 ('much', 44),
 ('made', 43),
 ('come', 43),
 ('fiction', 42),
 ('writing', 42),
 ('room', 41),
 ('something', 41),
 ('fact', 39),
 ('world', 38)]

Try removing the stopwords from the text field above.

### Example: Reading and Anzlying Statistics

Let's read in two CSV files with Bourdieu's statistics 

In [3]:
import pandas as pd

In [9]:
# Let's read 
bourdieu_dates_df = pd.read_csv('bourdieu-dates.csv', encoding='utf-8', parse_dates=True)
bourdieu_publishers_df = pd.read_csv('bourdieu-publishers.csv', encoding='utf-8', parse_dates=True)

In [8]:
bourdieu_dates_df

Unnamed: 0,date_of_birth,L'Express,Quinzaine_Litteraire
0,Born before 1900,4,7
1,1900-9,10,27
2,1910-19,17,15
3,1920-9,33,28
4,1930-9,11,15
5,1940-after,5,5
6,NR [not reported],12,9


In [10]:
bourdieu_publishers_df

Unnamed: 0,Publishers,L'Express,Quinzaine_Litteraire
0,Gallimard,8,34
1,Seuil,7,12
2,Denoël,3,6
3,Flammarion,11,5
4,Grasset,14,8
5,Stock,11,1
6,Laffont,18,3
7,Plon,1,4
8,Fayard,5,4
9,Calmann-Levy,1,2
