## Introduction to Python 

Python is a programming language that can be used in innumerable ways.

Some of the ways that Python applies to journalism.

* Analyze and visualize data
* Scrape websites and PDFs
* Build and manage databases 

Some of the advanatages of Python over using Google Sheets include versatility (it's a full-blown programming language) reproducibility and sharability. Once you create a notebook, it's easy to go back to track each step of your work.

### Start with variables

We will start with learning about variables. A variable is like a container. You can assign a value. 

<img src="https://multimedia.report/images/classes/coding/assignment.gif" alt="animation showing how variables work" style="margin-left:0">

*Whatever is on the right side, will be stuffed into the left side.*

Let's try it. Let's stuff the number 7 into x. Put your cursor in the box below, hold the shift key down, and press return.

In [None]:
x = 7

Next, let's see the output. We'll type just "x" by itself and Python will output its value. In each of these boxes, put your cursor in the box, hold down the shift key on your keyboard, and press return.

In [None]:
x

One shortcut, we can combine both statements into one, so that when we set a variable, we can immediately see its output. Let's try:

In [None]:
k = 5
k

This is common in the tasks we'll do later. We'll list a variable by itself immediately after we set it so that it will show us the output.

Here is something else we can do, we can replace the value in variables. When we do, it takes its most recent assignment.

In [None]:
j = 3
j = 5
j

Notice how j is equal to 5 and not three. We replaced its value by assigning a new value.

By the way, how is that x doing?

In [None]:
x

Yes, it's still 7. So variables will carry on from other cells. Once we run the cell (box) it's set. Go back up to the first cell we ran (where we set it to 7) and change the value to 10 and run it again. Then come back here and run the cell below.

In [None]:
x

So, the order of operations matters. The order in which we run each Python line of code makes a difference. Notice those numbers beside each cell? That describes the order we ran each cell so we can keep track of what was done. 

Next, let's see how we can assign variables to other variables.

In [None]:
x = 3
y = x
y

Note how we assigned 3 to x, then we assigned the *value* of x to y. Why didn't the output simply say "y"? Because that's a different datatype. That's a string. Let's try it one more time.

In [None]:
x = 3
y = "x"
y

Ah hah! We assigned text string "x" to y, so y now stores the value of a piece of text. If we left the quotes off, it would assign the *value* of x rather than the string "x". Important distinction.

Here are some other things we can do:

In [None]:
x = 2 + 2
x

In [None]:
x = 5
y = 1
j = x + y
j

Let's try something different. What would happen if we added two strings together? Any guesses? Guess what might happen before we run the code below. The run it to see if you were right.

In [None]:
firstname = "Jeremy"
lastname  = "Rue"
fullname  = firstname + lastname
fullname

### Lists

There is a special type of variable in Python called lists. It's pretty cool. It gives us the ability to store multiple values in a single variable. Let's take a look:

In [None]:
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November"]
months

Notice the output simply lists the values in the list. How do we recall a value? We use something called the **zero index**, which starts with zero as the first value, and so on?

Take a guess, what will the following output?

In [None]:
months[0]

What about this one?

In [None]:
months[1]

OK, one more time... before running this code, take a guess, what will this output?

In [None]:
months[5]

There are also special commands called "methods" we can run on lists. 

We can ask this list how long it is:

In [None]:
len(months)

We can also append (add to) the list, even though we already assigned the values.

In [None]:
months.append('December')
len(months)

We can also extract a portion, or slice, of this list using special notation.

In [None]:
months[0:3]

Note that the zero referred to the first item, but the second number referrs to the item before the index. Here is a nifty guide to refer to if you ever forget:

```python
a[start:stop] # items start until stop - 1
a[start:]     # items start then go through the rest of the list
a[:stop]      # items from the beginning until stop - 1
a[:]          # a copy of the whole list

a[start:stop:step] # start, but do not past stop, by step
```

Try a few of these from above yourself in the next cell:

In [None]:
months[]

### Dictionaries

The next datatype is dictionaries. Dictionaries are similar to JSON, which we'll go over later in class. 

The important thing to know about dictionaries is they store values by properties called "keys" (similar to CSS).

In [None]:
# Curly braces! Not brackets 

my_dict = { 
    "city": "Berkeley", 
    "state": "California", 
    "county": "Alameda", 
    "zip": 94720
}

my_dict

We can recall any value by its keys using notation similar to lists, but putting the property string instead of an index.

In [None]:
my_dict["city"]

In [None]:
my_dict["zip"]

In [None]:
my_dict.keys()

In [None]:
my_dict.values()

## Importing Libraries

Okay, now that we have all of that out of the way, we will do some data stuff. The first thing we need to do is import some libraries. We will use the popular Pandas library, and also import Seaborn for doing some basic data visaulizations. These libraries were installed prior to this lesson, so you don't have to install them. But if you were doing this on your own, you'd need to install them before running the code below.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Now that we've imported our libraries, let's bring in our .csv file using pandas. Notice above, our Pandas library was stored in a variable called "pd" (short for Pandas). Technically, we could have specified anything for our Pandas library, but pd is the standard convention, so let's stick with that.

In [None]:
pulitzers = pd.read_csv("pulitzer-circulation-data.csv", encoding='utf-8')

Now, let's take a look and see what it looks like. We can run the command `head()` on the variable to see the first five rows of data.

In [None]:
pulitzers.head()

We can also put a number in the parenthesis to specify a specific number of rows to show.

In [None]:
pulitzers.head(10)

### Interview Functions

Let's interview our data. Here are some functions we can run. The "df" is a placeholder for your data. It's stands for "data frame" which is what we call this datatype when using Pandas.

`df.head()` - get the first 5 rows of your data (or specify number)

`df.tail()` - get the last 5 rows of your data (or specify number)

`df.sample(5)` - get a random sampling of 5 rows of your data

`df.columns` - get a list of all the columns

`df.info()` - get number of rows with data and data type for each column 

`df.shape` - get the number of rows and columns

`df.describe()` - get a variety of statistical calculations for all values in each column

Let's take these functions for a spin:

In [None]:
#also try pulitzers.columns[2:5]
pulitzers.columns

In [None]:
#describes the number of rows and columns
pulitzers.shape

In [None]:
pulitzers.describe()

Note that we can also use dot-notation to call any column we want.

In [None]:
pulitzers.Newspaper

In [None]:
pulitzers["Daily Circulation, 2013"]

In [None]:
pulitzers.info()

In [None]:
sns.lmplot(
    x="Pulitzer Prize Winners and Finalists, 1990-2014", 
    y="Change in Daily Circulation, 2004-2013 in Percent", 
    data=pulitzers)

### Restaurant Inspections

In this next section, let's look at a larger dataset. This is restaurant inspections for all restaurants in Alameda County as of Aug 1, 2020. It has 55,000 rows, which is difficult to open in spreadsheet software. 

In [None]:
inspections = pd.read_csv("Restaurant_Inspections.csv")
inspections.head()

We can look at unique values for any column of data. Let's look at the Grade column to see what grades were given. (They correspond to colors)

In [None]:
inspections.Grade.unique()

In [None]:
inspections.groupby("Grade").Facility_ID.count()

We can search by grade or by a name of a specific restaurant.

In [None]:
inspections[inspections["Grade"] == "R"]

In [None]:
reds = inspections[inspections["Grade"] == "R"]
reds.groupby("City").count()

In [None]:
inspections[inspections["Facility_Name"] == "BLIND TIGER"]