# Class

## Intro 
Why doing mathematically operations aren't intuitive in python without a library.

In [1]:
my_list = [1, 4, 5]

In [2]:
my_list * 3

[1, 4, 5, 1, 4, 5, 1, 4, 5]

In [3]:
# my_list + 3    # produces error to illustrate the point

In [4]:
# Instead, you'll typically use a for loop or list comprehension to accomplish mathmatical operations
[n * 3 for n in my_list]

[3, 12, 15]

Doing this a lot is annoying, and we would do this a lot in data analysis.

## Pandas
Pandas was created to handle mathematical operations and data analysis. Note that this library is not very pythonic, and does things its own way.

In [5]:
# import libraries
import pandas as pd    # pd is a shortcut to call python methods
import random          # To import random numbers that are random for our purposes, but are reproducable
random.seed(10)        # Using this specific seed to reproduce results when notebook is reru

### Pandas Series

In [6]:
temp_cs = [random.randint(-20, 40) for i in range(20)]    # Created 20 random values between -20 and 40 

In [7]:
print(temp_cs)    # If you only run the random.seed once, you should have the same list as everyone else in the class

[16, -18, 7, 10, 16, -20, -7, 9, 32, 11, 32, -3, 21, 31, -10, -18, 13, 11, 0, -16]


We want to convert the temps_cs from celcius to fahrenheit. The formula is `F = (C * 9/5) + 32`.

In [8]:
# You will get an error becauase of the issues listed at the top of this notebook.
# temp_fahr = (temp_cs * 9/5) + 32

We should use `pandas.Series` to operate on `one dimensional` data.

In [9]:
temp_cs_series = pd.Series(temp_cs)    # convert temp_cs list to a pandas Series in order to use pandas

In [10]:
temps_fahrenheit = (temp_cs_series * 9/5) + 32

In [11]:
temps_fahrenheit.values

array([60.8, -0.4, 44.6, 50. , 60.8, -4. , 19.4, 48.2, 89.6, 51.8, 89.6,
       26.6, 69.8, 87.8, 14. , -0.4, 55.4, 51.8, 32. ,  3.2])

In [12]:
temps_fahrenheit    # Series with indices on the left, and values on the right

0     60.8
1     -0.4
2     44.6
3     50.0
4     60.8
5     -4.0
6     19.4
7     48.2
8     89.6
9     51.8
10    89.6
11    26.6
12    69.8
13    87.8
14    14.0
15    -0.4
16    55.4
17    51.8
18    32.0
19     3.2
dtype: float64

## Another example

In [13]:
grades = [random.randint(4, 10) for i in range(40)]

In [14]:
print(grades)

[5, 9, 6, 4, 7, 10, 5, 8, 6, 7, 7, 6, 10, 9, 6, 7, 5, 9, 6, 9, 6, 5, 7, 10, 10, 5, 7, 8, 7, 4, 8, 4, 5, 5, 5, 6, 8, 6, 10, 5]


In [15]:
# Let's turn that into a series
grades_series = pd.Series(grades)
print(type(grades_series))

<class 'pandas.core.series.Series'>


In [16]:
print(grades_series)    # indices on the left, values on the right (again)

0      5
1      9
2      6
3      4
4      7
5     10
6      5
7      8
8      6
9      7
10     7
11     6
12    10
13     9
14     6
15     7
16     5
17     9
18     6
19     9
20     6
21     5
22     7
23    10
24    10
25     5
26     7
27     8
28     7
29     4
30     8
31     4
32     5
33     5
34     5
35     6
36     8
37     6
38    10
39     5
dtype: int64


In [17]:
# Just see the values, rather than the indices
grades_series.values

array([ 5,  9,  6,  4,  7, 10,  5,  8,  6,  7,  7,  6, 10,  9,  6,  7,  5,
        9,  6,  9,  6,  5,  7, 10, 10,  5,  7,  8,  7,  4,  8,  4,  5,  5,
        5,  6,  8,  6, 10,  5])

In [18]:
# Just see the index
grades_series.index

RangeIndex(start=0, stop=40, step=1)

Let's sort the values with the `sort_value` method. The index doesn't really help us that much in this context.

In [19]:
grades_series.sort_values(ascending=False)

38    10
5     10
24    10
23    10
12    10
1      9
13     9
19     9
17     9
30     8
36     8
27     8
7      8
28     7
26     7
22     7
15     7
10     7
9      7
4      7
18     6
14     6
11     6
35     6
8      6
37     6
2      6
20     6
32     5
34     5
33     5
0      5
25     5
21     5
16     5
6      5
39     5
31     4
29     4
3      4
dtype: int64

In class, we grabbed random names from https://www.randomlists.com/. For example, I grabbed from here https://www.randomlists.com/random-names?qty=20. You can copy and paste into Jupyter notebook after manipulating these names a bit (see below).

It is worth looking at how to create a list in your text editor (TODO)... multi-cursor, and join lines to and formatting to 80 lines. 

In [20]:
students = ["Molly Church", "Gloria Fischer", "Ashly Stuart", "Raegan Hanson", 
"Zechariah Hubbard", "Payten Colon", "Ainsley Vargas", "Philip Sullivan", 
"Emery Clarke", "Emmy Irwin", "Reyna Donovan", "Elaine Burgess", 
"Abram Hubbard", "Quincy Kane", "Barrett Duffy", "Pranav Esparza", 
"Darion Castaneda", "Erin Prince"]
students

['Molly Church',
 'Gloria Fischer',
 'Ashly Stuart',
 'Raegan Hanson',
 'Zechariah Hubbard',
 'Payten Colon',
 'Ainsley Vargas',
 'Philip Sullivan',
 'Emery Clarke',
 'Emmy Irwin',
 'Reyna Donovan',
 'Elaine Burgess',
 'Abram Hubbard',
 'Quincy Kane',
 'Barrett Duffy',
 'Pranav Esparza',
 'Darion Castaneda',
 'Erin Prince']

In [21]:
grades_series.index = students

ValueError: Length mismatch: Expected axis has 40 elements, new values have 18 elements

## Filtering data

In [None]:
grades > 7     # returns booleaning values for the filter
# grades[grades >7]    # give grades when greater than 7

In [None]:
condition = (grades > 6) & [grades < 10)
grades[condition]

In [None]:
condition3 = (grades < 6) | (grades > 8) & (grades != 10)
grades[condition3]

In [None]:
condition_5 = grades_series.isin([4, 5, 9])
grades_series[condition5]