### 1. Import NumPy and Pandas

In [29]:
import pandas as pd
import numpy as np

### 2a. Create a  pd.Series with 10 elements

In [30]:
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
s

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

#### Return the 5th element

In [31]:
s[4]

5

#### Return the last element

In [32]:
s[9]

10

Did you intuitively try using -1? If so, you saw that unlike a python list, this does not work in a pd.Series.  
Why?   
We get a key error. That means that -1 is not found in the index. We would get the same error if we used 10.    
So  it's looking for elements in the index. Which means that if the index was ['a', 'b', 'c', ...] we could type s['a'] and it would return the value stored under the index 'a'.  

#### Return the 7th-9th elements

In [33]:
s[6:9]

6    7
7    8
8    9
dtype: int64

### 2b. Create a 2D numpy array of size (10,4)

In [34]:
ar = np.random.random((10,4))
ar

array([[0.38451438, 0.88968762, 0.57598771, 0.82936463],
       [0.09531394, 0.12483737, 0.72357494, 0.20777913],
       [0.36245928, 0.94475876, 0.74223792, 0.42767057],
       [0.16597462, 0.52842608, 0.10917502, 0.70361658],
       [0.78450887, 0.85904413, 0.70221941, 0.9010161 ],
       [0.76475656, 0.75957701, 0.07690665, 0.09912849],
       [0.2779539 , 0.15247297, 0.10991839, 0.47912258],
       [0.32346691, 0.69717284, 0.42581536, 0.84637056],
       [0.46792695, 0.86147869, 0.06098851, 0.14581066],
       [0.64853801, 0.38685472, 0.46231367, 0.10714739]])

#### Use slicing and indexing to play around with your array. Explore what results you get with different combinations.

In [35]:
ar[1,1]

0.12483737344224399

In [36]:
ar[0,1]

0.8896876228206617

In [37]:
ar[:,2]

array([0.57598771, 0.72357494, 0.74223792, 0.10917502, 0.70221941,
       0.07690665, 0.10991839, 0.42581536, 0.06098851, 0.46231367])

In [38]:
ar[2,:]

array([0.36245928, 0.94475876, 0.74223792, 0.42767057])

In [39]:
# groups, rows, columns 
ar[9,3]

0.10714738521429379

### 3. Lets create a dataframe holding the grades of 50 students, in 5 courses
- min grade: 0 
- max grade: 100
- columns: courses
- rows: students

In class we saw how to create a numpy array with random floats. Here is how to create one with random integers

In the next cell  we'll generate random numbers. Which also means, that the results between everyone will be different.   
However it would be better if the results amongst you were consistent. That means they are reproducable and easier for me to identify any mistakes.   
To achieve that, we will set the seed in numpy.random, which will make random numbers predictable.   
For a detailed explanation you can check the accepted answer in this post: https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-dohttps://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do  

For now, just run the next cell :)

In [40]:
np.random.seed(5)

# this function chooses a random integer from 0 to 100, as many times necessary to create an ndarray of size (10,5)
nd_array_radom_integer = np.random.randint(low=0, high=101, size=(50, 5))
nd_array_radom_integer

array([[ 99,  78,  61,  16,  73],
       [  8, 100,  62,  27,  30],
       [ 80,   7,  76,  15,  53],
       [ 80,  27,  44,  77,  75],
       [ 65,  47,  30,  84,  86],
       [ 18,   9,  41,  62,   1],
       [ 82,  16,  78,   5,  58],
       [  0,  80,   4,  36,  51],
       [ 27,  31,   2,  68,  38],
       [ 83,  19,  18,   7, 100],
       [ 30,  62,  11,  67,  65],
       [ 55,   3,  91,  78,  27],
       [ 29,  33,  89,  85,   7],
       [ 16,  94,  14,  90,  31],
       [  9,  38,  47,  16,   5],
       [ 34,  45,  59,  24,  13],
       [ 31,  32,  76,  44,   5],
       [ 14,  47,  94,  82,   0],
       [  7,  86,  16,  64,   8],
       [ 90,  44,  37,  94,  75],
       [  5,  22,  52,  69,  82],
       [ 60,  91,  29,  88,  97],
       [ 92,  79,  70,  35,  20],
       [ 49,  72,  32,  82,  13],
       [ 92,  18,  52,  81,  22],
       [ 58,  83,  92,  83,  49],
       [  4,  82,  36,  41,  20],
       [ 32,  10,  31,  15,  22],
       [ 70,   9,  63,  94,  14],
       [ 66,  

#### Create a pd.DataFrame from the nd_array_radom_integer and call it 'grades'

In [41]:
grades = pd.DataFrame(data=nd_array_radom_integer)
#grades

### 4. Use the list 'col_names' to rename the columns of 'gradesgrades'
Not sure how? Look into the pandas documentation.   
*Hint: Dictionary comprehension  (it's not the only solution, but it's a fun one)*

In [42]:
col_names = ['Algebra', 'History', 'Physics', 'Biology', 'Language']

In [45]:
grades.rename(columns={0: 'Algebra', 1: 'History', 2: 'Physics', 3: 'Biology', 4: 'Language'})

Unnamed: 0,Algebra,History,Physics,Biology,Language
0,99,78,61,16,73
1,8,100,62,27,30
2,80,7,76,15,53
3,80,27,44,77,75
4,65,47,30,84,86
5,18,9,41,62,1
6,82,16,78,5,58
7,0,80,4,36,51
8,27,31,2,68,38
9,83,19,18,7,100


#### How many ways can you think of, to check if you were successuf? (In other words, what methods do you know that will contain the column names of a dataframe in their output? )

In [48]:
grades.columns

RangeIndex(start=0, stop=5, step=1)

In [52]:
grades.index

RangeIndex(start=0, stop=50, step=1)

### 5. Use head, info and describe to get basic info about the dataframe. 

In [49]:
grades.head()

Unnamed: 0,0,1,2,3,4
0,99,78,61,16,73
1,8,100,62,27,30
2,80,7,76,15,53
3,80,27,44,77,75
4,65,47,30,84,86


In [50]:
grades.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   0       50 non-null     int32
 1   1       50 non-null     int32
 2   2       50 non-null     int32
 3   3       50 non-null     int32
 4   4       50 non-null     int32
dtypes: int32(5)
memory usage: 1.1 KB


In [51]:
grades.describe()

Unnamed: 0,0,1,2,3,4
count,50.0,50.0,50.0,50.0,50.0
mean,47.94,45.5,44.52,53.18,45.18
std,30.887161,30.335034,27.992885,31.137411,30.602581
min,0.0,0.0,2.0,1.0,0.0
25%,21.75,19.0,20.75,24.5,20.0
50%,49.5,42.0,37.5,64.0,46.0
75%,71.5,74.75,68.25,81.75,74.5
max,100.0,100.0,97.0,95.0,100.0


#### Find the median (not the mean) grade for Physics

In [54]:
grades[2].median()

37.5

### 6. Add a new column to the dataframe called 'Average'. It should be the mean of each row. 
*Hint: using axis=1 as an argument will calculate the mean across a row*

In [56]:
grades['Average'] = grades.mean(axis=1)

In [57]:
grades.head(2)

Unnamed: 0,0,1,2,3,4,Average
0,99,78,61,16,73,65.4
1,8,100,62,27,30,45.4


#### Check that it worked

### 7.  If 50% is the average needed to pass, how many students have passed? 

*Hint: you will need to select part of the dataframe and find the size of the selection*

In [61]:
passed_students = grades[grades['Average']>=50]

In [60]:
passed_students.count()

0          17
1          17
2          17
3          17
4          17
Average    17
dtype: int64

### 8. How many perfect grades (100) have been assigned to each class? 
*Hint: you can use the method .count()*

In [67]:
grades.count('Average'==100)

ValueError: No axis named False for object type <class 'pandas.core.frame.DataFrame'>

### 9. Let's create a scatter plot for  'Biology'

#### First try to find the answer on your own. A problem will arise, try to think of a way to overcome it.
Whether you solved it or not, in the end, expand the contents of the next cell. 

If you look at the documentation for .plot.scatter() you will see that the first arguments it takes are:   
- x : int or str. The **column name or column position** to be used as horizontal coordinates for each point.
- y : int or str.   The column name or column position to be used as vertical  coordinates for each point.

So it requires to know what to put on the x and y axis. In other words, you need to plot 'Biology' against something, which in our case can be the student number (our index).   
To do that,  all we have to do is add a new column to out dataframe. One way would be:   
`grades['Student']=range(0,50)`

But let's try something else:   
` grades.reset_index(inplace=True)`

This will create a new column called 'index' using the values of the index. The argument 'inplace' means that adding the new columns is a permanent change to the dataframe.

In [64]:
grades.reset_index(inplace=True)

In [65]:
grades.plot.scatter(x=3, y=)

TypeError: scatter() missing 2 required positional arguments: 'x' and 'y'

### 10. Let's create a histogram for Algebra

Look at the next command `.plot.hist()`
##### A histogram is a representation of the distribution of data. 
We're using the argument `figsize` to make the plot bigger and `bins` to group the grades.    
10 bins means that we're splitting the grades in 10 groups (0-10, 11-20, 21-30, ... , 91-100)

In [66]:
grades['Algebra'].plot.hist(figsize=(20,10), bins=10)

KeyError: 'Algebra'

#### Change the numbers in figsize

### Change the number of bins and observe how the plot changes.

So what is the correct number of bins?   
In the case above, splitting the grades in groups of 10 is reasonable. But in most cases it's not that simple - luckily the necessary math, are already implemented in the functions making histograms.   
But it's good to know and understand the basic differences behind them.    
You can read more on the topic here https://www.statisticshowto.com/choose-bin-sizes-statistics/

## You're done!

Add, commit, and push your changes to this notebook to your repo and then submit a pull request. 