<a href="https://colab.research.google.com/github/esohman/EADH/blob/main/3_EADH_intermediate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#The Intermediate notebook
Welcome to the intermediate notebook.
By now you should have a solid grasp of how different data types work and how they can be used and manipulated in different situations.

You should also understand for loops and list comprehension and be able to do small scale NLP projects using NLTK and/or spaCy as well as understand how to create and use regex patterns.

If you feel that this is not the case, I recommend that you go back and revise the sections that you feel you do not yet quite understand. Look at the notebooks themselves and follow the links to additional resources to better understand all the different aspects of Python programming at the beginner level.

Welcome to the intermediate level!




#Functions

The first thing we are going to discuss is functions. Functions are a way to reuse code. Instead of writing the same or similar piece of code multiple times, you can create a function and call it every time you need it.

Say you have a list of tuples that contain the heights and bases of triangles. You can create a for loop to go through that list and in the loop pass those values to your function that calculates the area of the triangle.

Check out the excellent [video](https://pythonhumanities.com/lesson-10-python-functions/) on functions on Python for DH.


---


We create functions with the keyword **def**, this is followed by the name of the function and then parentheses that can be left empty, but typically contain the names of the variables you want to use inside your function that have been passed to the function when the function was first called. In the code below, you pass the arguments i,j to the function tri_area. Inside that function i,j are known as h,b. As a sidenote, i and j are known as arguments when they are passed to the function, h and b are known as parameters. Some people use these interchangeably, but this is their correct meaning and there is a distinction.

Functions have a return function. the return function states what it is that the function outputs.

In [None]:
def tri_area(h,b):
  return (h*b)/2

lst = [(2.45,4),(3.3,5.4),(4.2,2.7),(4,4),(68,45)]
for i,j in lst:
  print(tri_area(i,j))



A more complex example where we call another function from within a function:

In [None]:
from math import pi

def area_circle(r):
  return pi * r ** 2

def vol_cylinder(r,h):
  return area_circle(r)*h

#we can use the same list of values from the previous example
lst = [(2.45,4),(3.3,5.4),(4.2,2.7),(4,4),(68,45)]

for i,j in lst:
  print(f'The area of the circle is {"{:.2f}".format(area_circle(i))} and the volume of the cylinder is {"{:.2f}".format(vol_cylinder(i,j))}')


###format()
You might have noticed that we used .format() when printing. You do not have to use format here, but you can use it to specify the number of decimal places you want to show.

Learn more about format() [here](https://www.w3schools.com/python/ref_string_format.asp).

###input
Sometimes you want to get user input. Now, there are ways of creating websites and graphical user interfaces in Python, but what if we just want a value or two for interactiveness in our script?

In [None]:
def monthly_pay(hrs,hpay,extra=0): #we can set a default value to a parameter
  return (hrs*hpay)*0.8+extra #let's assume a tax rate of 20%, this could be a parameter too
  
  
hours = float(input("How many hours did you work this month: "))
hourly = float(input("What is your hourly pay: "))


e = input("Are you getting a bonus or similar?(y/n) ")

if e == "y":
   extra = float(input("How much: "))
elif e == "n":
   extra = 0
else:
   print("Error")

print(f'Your monthly take-home pay is {monthly_pay(hours,hourly,extra)}')


##Modules, libraries, and documentation
More and more we have started using external modules and libraries. These libraries have functions that are not built into your base Python installation. Sometimes you need to use pip install to install them (on colab you can sue pip with "!pip install pandas" where pandas is the name of the library you want to install. On Colab, many of the most commonly used libraries are already installed, so you rarely have to do this.)

We import these libraries so that we can use them in our code. We do this by typing import and the name of the library we want. We can also import only certain parts of a library. In the functions example we could have written#

```
import math

print(math.pi*r**2)
```
or
```
from math import pi
print(pi*r**2)
```
we can also rename the libraries we are importing
```
import pandas as pd
```
All decent libraries come with documentation. Documentation is very important in learning to understand how to use new libraries, or how to get the most out of familiar libraries.

If you are ever stumped on how to do something, the documentation of the library you are using should be one of the first places you look for more information. Stackoverflow, is another top two contender.


#Pandas & Numpy

pandas is a highly useful Python library for data analysis.
Really, anytime you are dealing with csv files, you should consider if pandas might be the best option for the task at hand.

pandas is built-on numpy and using numpy mathematical functions with pandas is quick, easy, and stress-free.

With pandas we can create dataframes, which are kind of like spreadsheets in that we have rows and columns of data. With pandas it is very easy to manipulate this data.


## Series
Series is like a one-column dataframe. It cannot have a column name, but it can have a series name and you can name the rows.


In [None]:
import pandas as pd

my_list = [123,2134,123] # a list of integers

#let's make this list into a pandas series
my_series = pd.Series(my_list)

In [None]:
print(f'My list: {my_list}\nMy Series: \n{my_series}')

In [None]:
#accessing individual elements
print(my_series[1])

In [None]:
#renaming the index
my_series1 = pd.Series(my_list,index = ["first", "second", "third"])
my_series2 = my_series.rename(index = {0:"first",1:"second",2:"third"}) #there is also the "inplace" option. Remember what it does?

In [None]:
print(f'1:\n{my_series1}\n2: \n{my_series2}')

In [None]:
#Creating a series from a dictionary
dicty = {"first":"lalala","second":123,"third":99.4}
my_dictseries = pd.Series(dicty) #if you specify the index here, the series will only consist of the specified indexes e.g.: my_dictseries = pd.Series(dicty, index = ["first","third"])

In [None]:
print(f'Series from dict:\n{my_dictseries}')

##Dataframes

In [None]:
#we can merge series to create a dataframe
df1 = pd.concat([my_series1, my_series2,my_dictseries], axis=1)

#or we can create one from a dictionary (typically a dictionary of lists)
d = {"ex1":[89,8,6,1,2,7,6],"ex2":[7,5,1,66,8,74,1]}
df2 = pd.DataFrame(d)

In [None]:
print(f'Df1:\n{df1}\nDf2:\n{df2}')

In [None]:
#access specific row using loc
df2.iloc[3]


In [None]:
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/XED/error.csv',index_col = 0) #replace this with any csv file (there's one under week 3 that contains Trump's insulting tweets)

In [None]:
df.head(12)

In [None]:
df.tail(7)

In [None]:
df.info()

In [None]:
df.shape

#Data Analysis

#Visualization