# Numpy and Pandas

## Numpy:
    1. What is Numpy?
    2. what are Numpy arrays and how useful are they?
    3. How to use Numpy and create Numpy arrays?
    4. Basic and Universal functions on Numpy arrays
    5. Practice

### Numpy: 
> usually refered to as an abbreviation for *Numerical Python* and as the name suggests, it is a collection of functions and classes and methods grouped together for efficient mathemical computations.
> Numpy is not part of a Standard Library => that, it does not come with the general distribution of python and therefore has to be installed using the below command
```python
pip install numpy
```
> if you want to install the module from your *Jupyter Notebook* then the command is as follows:
```python
!pip install numpy
```

> Numpy library is actually a sub-set of another mighty library called as ***"Scipy"*** referred to as an abbreviation for *Scientific Python*

> To learn more about the libraries from the documentation

- [Numpy Documentation](https://docs.scipy.org/doc/numpy/user/quickstart.html)

- [Scipy Documentation](https://docs.scipy.org/doc/scipy/reference/)

#### Numpy Array:

> Numpy Arrays also called as *'ndarray'* is a powerful datastructure provided by Numpy library to work with homogeneous data.

> This *homogeneous* structuring contributes to the highly effective and efficient handling of the data in the structure.

> They are capable of handling multi-dimensional data.

#### Using Numpy

After successfully installing Numpy, we must first load the library to be able to use it and to do so, we need to type the command below:

```python
import numpy as np
```

*import* -> command used to load a library

*numpy* -> library name

*as* -> command to set an alias for the library being imported

*np* -> alias name for this project

In [2]:
import numpy as np

#### Creating Array

```python

a = [1,2,3,4,5]

b = np.array(a)

#or

b = np.ndarray(a)
```

In [3]:
a = [51,26,13,43,59]

b = np.array(a)

In [4]:
print(b)

[51 26 13 43 59]


- Here 'b' is an numpy array object created for a regular list object 'a'
- 'b' is a single dimensional array with 5 elements
- we can use '.shape' attribute of the numpy arrays to check the shape (dimension) of the array object

In [5]:
print(b.shape)

(5,)


- The datatype of the array can be check using the '.dtype' attribute

In [6]:
b.dtype

dtype('int64')

##### Basic Functions on Arrays:

- reshape()
- arange()

are the most required basic functions of numpy arrays

- the shape of an array can be changed using the 

```python
.reshape()
```
funtion on the array which takes parameters (rows,columns)

In [7]:
# Changing the shape to 1 row and 5 Columns

b.reshape(1,5)

array([[51, 26, 13, 43, 59]])

In [8]:
# Changing the shape to 5 rows and 1 Column


b.reshape(5,1)

array([[51],
       [26],
       [13],
       [43],
       [59]])

In [9]:
# Changing the shape back to its original form


b.reshape(5,)

array([51, 26, 13, 43, 59])

In [10]:
c = np.arange(len(b))

In [11]:
print(c)

[0 1 2 3 4]


In [12]:
print(type(c))

<class 'numpy.ndarray'>


In [13]:
print(c.dtype)

int64


In [14]:
print(c.shape)

(5,)


##### Universal Functions on Arrays:

1. Addition
2. Multiplication
3. min()
4. max()

> Universal functions are called so and are of high importance for the reason that they are applied on each individual element of the arrays interacting and produce a new array

> For example, when we say b + c where 'b' and 'c' are numpy arrays, the resultant wowuld be another numpy array containing the sum of the corresponding elements in each array

Refer below:

In [15]:
print(b+c)

[51 27 15 46 63]


> Whereas in the addition operation between lists, it would result in a new list which is a concatenation of the given lists as below:

In [16]:
d = [1,2,3,4]

e = [5,6,7,8]

print(d+e)

[1, 2, 3, 4, 5, 6, 7, 8]


In [17]:
PI = 3.14

print(b * PI)

[160.14  81.64  40.82 135.02 185.26]


In [18]:
print(d * PI)

TypeError: can't multiply sequence by non-int of type 'float'

##### Slicing over the Arrays:

In [19]:
# Create an Array with 1500 elements in it

c = np.arange(1,1501)

In [20]:
# Reshape it to contain 150 Rows and 10 Columns => 150 * 10 = 1500 elements

c = c.reshape(150,10)

In [21]:
print(c[:5,:5])

[[ 1  2  3  4  5]
 [11 12 13 14 15]
 [21 22 23 24 25]
 [31 32 33 34 35]
 [41 42 43 44 45]]


In [23]:
print(np.min(c))

1


In [24]:
print(np.max(c))

1500


In [25]:
print(np.average(c))

750.5


In [26]:
print(np.diagonal(c))

[  1  12  23  34  45  56  67  78  89 100]


In [28]:
print(np.size(c))

1500


## Pandas:

> Pandas is yet another Python library which is again not a part of the Standard Library and therefore requires installation

```python
pip install pandas
```

> Pandas provides a new datastructure called as *'Data Frames'* which is multi-dimensional and can be imagined to be something similar to that of the tables in databases or a sheet in the Excel / Spreadsheets and are *Highly Optimised*.

### Creating Dataframe:

> One of the most common ways of creating a Pandas Dataframe is by passing a dictionary object to pandas.DataFrame() function.

In [29]:
# Before working with the Pandas library, first it needs to be imported.

import pandas as pd

In [30]:
example_dict = {'Name':['Talent Sprint','WISE1','WISE2'], 'Address':['Gacchibowli','BVRITH','SVECW']}

In [31]:
example_dataframe = pd.DataFrame(example_dict)

print(example_dataframe)

            Name      Address
0  Talent Sprint  Gacchibowli
1          WISE1       BVRITH
2          WISE2        SVECW


Pandas are often known for their built-in functions which makes working with data a lot easier right from importing the data from an external CSV file.

In [34]:
data_frame = pd.read_csv('project_files/Admission_Predict.csv')

In [35]:
print(type(data_frame))

<class 'pandas.core.frame.DataFrame'>


In [36]:
print(data_frame)

     Serial No.  GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  \
0             1        337          118                  4  4.5   4.5  9.65   
1             2        324          107                  4  4.0   4.5  8.87   
2             3        316          104                  3  3.0   3.5  8.00   
3             4        322          110                  3  3.5   2.5  8.67   
4             5        314          103                  2  2.0   3.0  8.21   
5             6        330          115                  5  4.5   3.0  9.34   
6             7        321          109                  3  3.0   4.0  8.20   
7             8        308          101                  2  3.0   4.0  7.90   
8             9        302          102                  1  2.0   1.5  8.00   
9            10        323          108                  3  3.5   3.0  8.60   
10           11        325          106                  3  3.5   4.0  8.40   
11           12        327          111             

In [38]:
# Check first 5 rows of data

print(data_frame.head())

   Serial No.  GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  \
0           1        337          118                  4  4.5   4.5  9.65   
1           2        324          107                  4  4.0   4.5  8.87   
2           3        316          104                  3  3.0   3.5  8.00   
3           4        322          110                  3  3.5   2.5  8.67   
4           5        314          103                  2  2.0   3.0  8.21   

   Research  Chance of Admit   
0         1              0.92  
1         1              0.76  
2         1              0.72  
3         1              0.80  
4         0              0.65  


In [40]:
# Check last 5 rows of data

print(data_frame.tail())

     Serial No.  GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  \
395         396        324          110                  3  3.5   3.5  9.04   
396         397        325          107                  3  3.0   3.5  9.11   
397         398        330          116                  4  5.0   4.5  9.45   
398         399        312          103                  3  3.5   4.0  8.78   
399         400        333          117                  4  5.0   4.0  9.66   

     Research  Chance of Admit   
395         1              0.82  
396         1              0.84  
397         1              0.91  
398         0              0.67  
399         1              0.95  


In [43]:
print(data_frame.shape)

(400, 9)


=> 400 Rows and 9 Columns

#### Some common useful functions on the dataframes:

In [44]:
# Minimum values of each column

print(data_frame.min())

Serial No.             1.00
GRE Score            290.00
TOEFL Score           92.00
University Rating      1.00
SOP                    1.00
LOR                    1.00
CGPA                   6.80
Research               0.00
Chance of Admit        0.34
dtype: float64


In [45]:
# Maximum values of each column

print(data_frame.max())

Serial No.           400.00
GRE Score            340.00
TOEFL Score          120.00
University Rating      5.00
SOP                    5.00
LOR                    5.00
CGPA                   9.92
Research               1.00
Chance of Admit        0.97
dtype: float64


In [46]:
# Display summary statistics of each variable

# the 'Describe()' function

print(data_frame.describe())

       Serial No.   GRE Score  TOEFL Score  University Rating         SOP  \
count  400.000000  400.000000   400.000000         400.000000  400.000000   
mean   200.500000  316.807500   107.410000           3.087500    3.400000   
std    115.614301   11.473646     6.069514           1.143728    1.006869   
min      1.000000  290.000000    92.000000           1.000000    1.000000   
25%    100.750000  308.000000   103.000000           2.000000    2.500000   
50%    200.500000  317.000000   107.000000           3.000000    3.500000   
75%    300.250000  325.000000   112.000000           4.000000    4.000000   
max    400.000000  340.000000   120.000000           5.000000    5.000000   

             LOR         CGPA    Research  Chance of Admit   
count  400.000000  400.000000  400.000000        400.000000  
mean     3.452500    8.598925    0.547500          0.724350  
std      0.898478    0.596317    0.498362          0.142609  
min      1.000000    6.800000    0.000000          0.34000