# Python for Librarians - Week 2 Workalong

Now that we have the basics under our belt we are going to look into:

- More details about functions
- What code libraries are
- The [Pandas Library](https://pandas.pydata.org/) which is *very* useful for data analysis. 


The following cell has a bit of code that will introduce two new functions to us:

- `int()`
- `input()`

See if you can determine what they do be running the following cells a few times.

In [1]:

number = int(input("Pick any number between 1 - 10: "))

if number > 10 or number < 1:
    print("Sorry you didn't pick a number between 1 - 10")
else:
    print("Your number is between 1 and 10!")

Pick any number between 1 - 10: 1
Your number is between 1 and 10!


## Question 1

Describe what the `int()` and `input()` functions do in the following markdown cell. 

The function int will...

The function input will...

## Functions

Functions are bits of code that do a certain task in a certain way. We made extensive use of the `print()` function in week 1. Some functions need _arguments_ or _paramaters_. We pass those in the the round brackets. EG. we put the variable name we want to print in the print function as a parameter such as with `print(age)`. Sometimes a function can take many arguments, but for now let's keep it simple.

We can create our own functions too! This is useful if we want to re-use the same chunk of code many different times. When we create a function in Python it looks like the:


```
def name_of_function(argument):
    #code
    #code
    #...
    return answer #not mandatory
    
```
When our function produces a value that we want to use further we end with the `return` and the variable name. That will pass the variable back to the place that called it. 

We've already seen how we *call* a functon:

```
name_of_function(argument)

```

If you would like to print out the results of your function you need to surround it with the `print` function

```
print(name_of_function(argument))

```

Or we can do this int to lines of code if we want

```
result = name_of_function(argument)
print(result)

```


## Another useful function...

The `len()` function tells us how long something is. In the next cell we print how long the string variable `sentence` is. Often a function can be applied to different types of variables and the results are as you would expect.

In [2]:
sentence = "Now is the time of our discontent."
print(len(sentence))

34


First let's reload our Toronto Weather Data from Week 1 and run the `len` function on it. Python tell us how many entries are in that dictionary. Long story short, Python will do a pretty good job of figuring out what you are trying do when you use a function. In the case of `len` in counts the letters if you give it a string, in the case of a dictionary it gives you the number of entries in the dictionary. Neat.

In [6]:
toronto_weather = {
    "Monday": 10.0,
    "Tuesday" : 12.0,
    "Wednesday" : 15.0,
    "Thursday" : 10.0,
    "Friday" : 9.0,
    "Saturday" : 6.0,
    "Sunday" : 6.0
    }

#Here we give two parameters to print. We separate them with a comma. This is *always* how pass multiple
#values to a function.
print("The length of our data set is:", len(toronto_weather))

The length of our data set is: 7


## A very common function

The cell below will create a function that uses a loop to find the average temperature for our fictional week of data. We then print the result to the screen. You'll notice that we can pass in the values from the dictionary to the function when we called it with `toronto_weather.values()`

(By the way, if you can understand the following block of code you've accomplished the hardest part of this class)


In [7]:
def average_temp(week):
    
    #create a running sum for all the entries we will loop through
    total_degrees = 0
    
    for day in week:
        total_degrees = total_degrees + int(day)
        
    mean = total_degrees / len(week)    
    return mean


print(average_temp(toronto_weather.values()))


9.714285714285714


## Questions about Functions 

----

## Libraries
It's not very useful to keep writing functions for common calculations, the point of programming is to built on what is already created. To demonstrate this we are going to look at a importing libraries, we are going to start with the Python [statistics](https://docs.python.org/3/library/statistics.html) Library in the following cell. Python includes a bunch of Libraries by default when it runs. Which is why we don't need to import anything before we use `print()` for example.

Calculating an average is such a common task we don't want to write our own version of it. We'll use the code written by some expert instead. When we want to use a library we use the `import` keyword.

In [10]:

#import the Python statistics Library.
import statistics

#the mean function is called statistics.mean. Here we run it on the values from our dictionary
mean = statistics.mean(toronto_weather.values())

print(mean)

9.714285714285714


As we saw with `len` over strings and dictionaries `statistics.mean` will work in the same way. It calculates the mean in of the values that you give it. Watch is find the mean of a list in the following cell.

In [11]:
marks = [80,75,85,70,60]

mean = statistics.mean(marks)

print(mean)

74


So far so good?

----

## The Pandas Library

Now we are going to focus on a great library that will allow us to look at data from CSV files and spreadsheets. We are now finally getting into the data science portion of the class! The following cell imports pandas, creates a panadas dataframe variable, and loads up a csv file that represents a data file. (When we import libraries sometimes we rename it so that there is less to type when you use it, that's why line 1 has the final portion `as pd` that lets us write shorter code. Such as `pd.read_csv` instead of `pandas.read_csv`. Almost all tutorials you find online will do when using pandas.

We are going to explore some [San Francisco Library Usage](https://www.kaggle.com/datasf/sf-library-usage-data) data. The dataset we are looking at is slightly truncated. As workers in Libraries we are often given data as Excel or CSV files. We are going to see how we can use Pandas to analyze a CSV file. My hope is that be the end of the lesson you'll try Pandas instead of Excel next time you have a data file to look at.

In [12]:
import pandas as pd

#Pandas has a function that will open up a csv and load it into a dataframe
sfl_data = pd.read_csv("week_2_san_francisco_worksheet.csv")

#will show us the top 10 entries in our dataframe using the head function along with the argument 10
sfl_data.head(10)

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Home Library Definition,Circulation Active Month,Circulation Active Year,Year Patron Registered
0,YOUNG ADULT,0,0,Marina,,,2015
1,ADULT,3,12,Richmond,April,2014.0,2013
2,JUVENILE,11,0,Bayview/Linda Brooks-Burton,June,2014.0,2010
3,JUVENILE,171,111,Merced,July,2016.0,2014
4,YOUNG ADULT,14,1,Richmond,October,2013.0,2011
5,ADULT,5,0,Main Library,May,2013.0,2013
6,YOUNG ADULT,194,178,North Beach,July,2016.0,2004
7,ADULT,26,9,Glen Park,July,2016.0,2003
8,ADULT,14,0,Ingleside,September,2013.0,2012
9,ADULT,0,0,Main Library,January,2014.0,2014


To get a general quantitative overview of our dataframe we apply the **.describe()** function to it. It will only apply those calculations to columns that have numeric data.

In [13]:
sfl_data.describe()

Unnamed: 0,Total Checkouts,Total Renewals,Year Patron Registered
count,10000.0,10000.0,10000.0
mean,166.5478,62.5035,2010.2911
std,447.643316,247.152412,4.366173
min,0.0,0.0,2003.0
25%,2.0,0.0,2007.0
50%,20.0,2.0,2012.0
75%,116.0,27.0,2014.0
max,7947.0,5953.0,2016.0


Wow! One patron has **7947** total checkouts.

# Pandas Pandas Pandas

Let's take a tour of all of the functions that we can use on a pandas dataframe. These often allow us to see different broad details about what is in our data.

**.groupby()**

**.count()** Will tell many entries are in certain column. In this example we are seeing how many different values show up in the `Home Library Defintion` column. It will then apply count to the remaining columns. Compare the output of the cell above to the one below.

In [41]:
# How many home Libraries are there
sfl_data.groupby("Home Library Definition").count()

Unnamed: 0_level_0,Patron Type Definition,Total Checkouts,Total Renewals,Circulation Active Month,Circulation Active Year,Year Patron Registered
Home Library Definition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anza,168,168,168,168,168,168
Bayview/Linda Brooks-Burton,193,193,193,193,193,193
Bernal Heights,236,236,236,236,236,236
Branch Bookmobile (West Portal),5,5,5,5,5,5
Children's Bookmobile,15,15,15,15,15,15
Chinatown,407,407,407,407,407,407
Eureka Valley/Harvey Milk Memorial,226,226,226,226,226,226
Excelsior,399,399,399,399,399,399
Glen Park,246,246,246,246,246,246
Golden Gate Valley,112,112,112,112,112,112


Let's look at how we can grab investigate different snapshots of the data.

In [34]:
#how many different patron types there are
sfl_data.groupby("Patron Type Definition").count()

Unnamed: 0_level_0,Total Checkouts,Total Renewals,Home Library Definition,Circulation Active Month,Circulation Active Year,Year Patron Registered
Patron Type Definition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ADULT,6454,6454,6454,6454,6454,6454
AT USER ADULT,3,3,3,3,3,3
AT USER SENIOR,1,1,1,1,1,1
AT USER WELCOME,1,1,1,1,1,1
DIGITAL ACCESS CARD,41,41,41,41,41,41
JUVENILE,1396,1396,1396,1396,1396,1396
RETIRED STAFF,7,7,7,7,7,7
SENIOR,981,981,981,981,981,981
SPECIAL,21,21,21,21,21,21
STAFF,31,31,31,31,31,31


In [37]:
#What is the average number of Total Checkouts for all of these patrons?
sfl_data["Total Checkouts"].mean()

166.5478

In [38]:
#What is the Maximum number of Total Checkouts for all of these patrons?
sfl_data["Total Checkouts"].max()

7947