# Getting Started with Pandas
Run this file after completing "Getting Started with Python".

In [0]:
# This cell is necessary only if you are running on Google Colab

import requests

def save_file(url, file_name):
  r = requests.get(url)
  with open(file_name, 'wb') as f:
    f.write(r.content)

save_file('https://homes.cs.washington.edu/~hschafer/cse416/a0/people-example.csv',
          'people-example.csv')

In [0]:
import pandas as pd

First we load the CSV file into a `pandas.DataFrame`

In [0]:
table = pd.read_csv('people-example.csv')

# `DataFrame` Tutorial
A `DataFrame` stores the data from a CSV in an easy to access way. You can start by inspecting the `DataFrame`

In [0]:
table

Unnamed: 0,First Name,Last Name,Country,age
0,Bob,Smith,United States,24
1,Alice,Williams,Canada,23
2,Malcolm,Jone,England,22
3,Felix,Brown,USA,23
4,Alex,Cooper,Poland,23
5,Tod,Campbell,United States,22
6,Derek,Ward,Switzerland,25


## Inspect columns of Dataframe
One of the central ideas of a `DataFrame` is the notion of a column. You can access any column from the CSV using the bracket notation.

In [0]:
table['Country']

0    United States
1           Canada
2          England
3              USA
4           Poland
5    United States
6      Switzerland
Name: Country, dtype: object

In [0]:
table['age']

0    24
1    23
2    22
3    23
4    23
5    22
6    25
Name: age, dtype: int64

You can also pass a list of column names to get a subset of the `DataFrame` back.

In [0]:
table[['Country', 'age']]

Unnamed: 0,Country,age
0,United States,24
1,Canada,23
2,England,22
3,USA,23
4,Poland,23
5,United States,22
6,Switzerland,25


You can also perform computations on columns to compute values. There are many methods, but some common ones are `mean`, `min`, `max,` and `sum`.

In [0]:
table['age'].mean()

23.142857142857142

In [0]:
table['age'].max()

25

# Creating new columns
A core part of using `pandas` is to use it to compute new values from the data in the dataset.

In [0]:
table

Unnamed: 0,First Name,Last Name,Country,age
0,Bob,Smith,United States,24
1,Alice,Williams,Canada,23
2,Malcolm,Jone,England,22
3,Felix,Brown,USA,23
4,Alex,Cooper,Poland,23
5,Tod,Campbell,United States,22
6,Derek,Ward,Switzerland,25


The following cell does two things
* Computes a series of `First Name` followed by a space followed by `Last Name` 
* Stores that series in a new column named `Full Name`

Remember that the right-hand side of an assignment gets evaluated first. `pandas` supports what is known as a "element-wise" operation; this means that when you do something like add a space to a column, it does so to each element in the column. Adding two columns together adds each corresponding element that is in the same position. Notice for strings, addition is the same as "string concatenation".

In [0]:
table['Full Name'] = table['First Name'] + ' ' + table['Last Name']

In [0]:
table['Random Column 1'] = pd.DataFrame([2,3,1,4,2,5,4]) # creating a new column
table['Random Column 2'] = table['Random Column 1'] * 3  # creating another new column

In [0]:
table

Unnamed: 0,First Name,Last Name,Country,age,Full Name,Random Column 1,Random Column 2
0,Bob,Smith,United States,24,Bob Smith,2,6
1,Alice,Williams,Canada,23,Alice Williams,3,9
2,Malcolm,Jone,England,22,Malcolm Jone,1,3
3,Felix,Brown,USA,23,Felix Brown,4,12
4,Alex,Cooper,Poland,23,Alex Cooper,2,6
5,Tod,Campbell,United States,22,Tod Campbell,5,15
6,Derek,Ward,Switzerland,25,Derek Ward,4,12


## Removing Columns in DataFrame
The folloing code calls the `drop` method that will remove things from the `DataFrame`. 

Whenever we work with structures that have multiple dimensions (in this case, rows and columns), Python almost always takes the convention that we call the rows the first dimension (dimension 0) and the columns the second dimension (dimension 1). We pass the parameter `axis=1` in the following call so that it removes the values from axis 1 (i.e. the columns).

In [0]:
table = table.drop(labels = ['Random Column 1', 'Random Column 2'], axis = 1)

In [0]:
table

Unnamed: 0,First Name,Last Name,Country,age,Full Name
0,Bob,Smith,United States,24,Bob Smith
1,Alice,Williams,Canada,23,Alice Williams
2,Malcolm,Jone,England,22,Malcolm Jone
3,Felix,Brown,USA,23,Felix Brown
4,Alex,Cooper,Poland,23,Alex Cooper
5,Tod,Campbell,United States,22,Tod Campbell
6,Derek,Ward,Switzerland,25,Derek Ward


Almost every operation on a `DataFrame` returns a new `DataFrame` rather than modifying the original. This is why in the previous cell, we had to write `table = ...`. For example, if we multiply the `age` columns below without assigning it to the `table` variable, nothing will change.

In [0]:
table['age'] * table['age']

0    576
1    529
2    484
3    529
4    529
5    484
6    625
Name: age, dtype: int64

In [0]:
table

Unnamed: 0,First Name,Last Name,Country,age,Full Name
0,Bob,Smith,United States,24,Bob Smith
1,Alice,Williams,Canada,23,Alice Williams
2,Malcolm,Jone,England,22,Malcolm Jone
3,Felix,Brown,USA,23,Felix Brown
4,Alex,Cooper,Poland,23,Alex Cooper
5,Tod,Campbell,United States,22,Tod Campbell
6,Derek,Ward,Switzerland,25,Derek Ward


# Filtering data
You can use comparisons and logical operators to filter data in `DataFrames`. For example, if you compare the `age` to 23, you get a series of `boolean` values.

In [0]:
table['age'] < 23

0    False
1    False
2     True
3    False
4    False
5     True
6    False
Name: age, dtype: bool

You can use this series of `boolean` values to index into the `DataFrame` and only get the rows where the value is `True`.

In [0]:
table[table['age'] < 23] 

Unnamed: 0,First Name,Last Name,Country,age,Full Name
2,Malcolm,Jone,England,22,Malcolm Jone
5,Tod,Campbell,United States,22,Tod Campbell


To make this clearer, you could also break this up into two steps

In [0]:
mask = table['age'] < 23
table[mask]

Unnamed: 0,First Name,Last Name,Country,age,Full Name
2,Malcolm,Jone,England,22,Malcolm Jone
5,Tod,Campbell,United States,22,Tod Campbell


What's returned in this case is also a `DataFrame`, so you can also use any other `pandas` operation we have discussed above

In [0]:
table[table['age'] < 23]['Full Name'] # fetching full names of people with age < 23

2    Malcolm Jone
5    Tod Campbell
Name: Full Name, dtype: object

In [0]:
# Write your code here!
mask = table['age'] > table['age'].mean()
table[mask]

Unnamed: 0,First Name,Last Name,Country,age,Full Name
0,Bob,Smith,United States,24,Bob Smith
6,Derek,Ward,Switzerland,25,Derek Ward


# Modifying data
Below is an example of how you can replace the values in a `DataFrame`. It replaces all instances of the value `USA` in the `Country` column with the value `United States`.

In [0]:
table = table.replace({'Country': 'USA'}, 'United States') 

In [0]:
table

Unnamed: 0,First Name,Last Name,Country,age,Full Name
0,Bob,Smith,United States,24,Bob Smith
1,Alice,Williams,Canada,23,Alice Williams
2,Malcolm,Jone,England,22,Malcolm Jone
3,Felix,Brown,United States,23,Felix Brown
4,Alex,Cooper,Poland,23,Alex Cooper
5,Tod,Campbell,United States,22,Tod Campbell
6,Derek,Ward,Switzerland,25,Derek Ward


In [43]:
l = []
for i in range(1,7):
  for j in range(1,7):
    for k in range(1,7):
      l.append(i+j+k)
s = sorted(l)

# for mean
sume = 0
for e in range(len(s)):
  sume += s[e]
   
meane = sume/len(s)
print(meane)

# for variance
sqaure = 0
for e in range(len(s)):
  sqaure += (s[e]-meane)**2

var = sqaure/len(s)
print(var)

10.5
8.75
