# Pandas
1. Pandas
2. Data structures
3. Pandas Series Series
     - Creating Series 
     - Manipulating Series
4. Pandas Dataframes
    - Creating Dataframes
    - Manipulating Dataframes
5. Reading Data from different Sources

## 1. Pandas
- It contains data structures and manipulation tools designed for data cleaning and analysis. 
- While it adopts many idioms from _Numpy_, there biggest difference is that Pandas is desined for working with tabular ot hereogenious data.NumPy , by contrast , is best suited for working with homogeneous numerical array data.
- Its name is derieved from "Panel data" an econometrics term for multidimensional structured data sets.

### Pandas installation and import 
- installation 
`!pip install pandas`
- Import 
`import pandas as pd` 

In [3]:
# Import Pandas 
import pandas as pd

## 2. Data Structures
- Pandas has 2 data structures as follows:
1. A __Series__ is 1-dimensional labeled array that can hold data of any type (integer, float,string ,boolean,python object,and so on). Its axis labels are collectively called an index. 
2. A __DataFrame__ is a 2- dimensional labelled data structure with columns. it supports multiple datatypes.

## 3. Pandas Series 
- Is a one- dimensional labeled arrau capable of holding any data type. However,a series is a sequence of homogenoues data types, similar to an array , list , or column in a tabe.
- It will assign a labeled index to each item in the Series. By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.

### 3.1 Creating Series 
1. __To create a numeric series__.

In [4]:
# Create a numeric series  
numbers = range(1,100,5)
pd.Series(numbers)

0      1
1      6
2     11
3     16
4     21
5     26
6     31
7     36
8     41
9     46
10    51
11    56
12    61
13    66
14    71
15    76
16    81
17    86
18    91
19    96
dtype: int64

- The output is of type `int64`
- The row names are usually denoted as _"Index"_
2. __To create an object series__

In [5]:
string = "Hi" , "How " , "are " ,"you","?"
pd.Series(string)

0      Hi
1    How 
2    are 
3     you
4       ?
dtype: object

- Output is of type `object`
3. __To create a series with both numeric and string values__

In [6]:
# create a series with an arbituary list 
pd.Series([365,'London',34.5,-34.5,'Happy Birthday'])

0               365
1            London
2              34.5
3             -34.5
4    Happy Birthday
dtype: object

- Here numeric types are treated as objects. _A serie cannot have multiple data types so it defines all of them as an object_
4. To set index values for a series.

In [7]:
marks = [60,89,74,86,100]

subject = ["Math" , "System design" , "Cloud Computing" , "Data Analysis" , "React"]

marks_series= pd.Series(marks,index =subject)
marks_series


Math                60
System design       89
Cloud Computing     74
Data Analysis       86
React              100
dtype: int64

- the index is added usin the argument `index=`. The data tyoe if the series continues to be numeric.

6. __To create series from a dictionary__

In [8]:
data = {'React':90,"Node":85,"Flutter":50,"Django":75}
pd.Series(data)

React      90
Node       85
Flutter    50
Django     75
dtype: int64

- On passing a dict the index in the resulting Series will have the dict's keys in sorted oder.

6. __A series with missing values__
- If we pass a key that is not defined then its value will be `NAN`

In [9]:
subject = ["Math" , "System design" , "Cloud Computing" , "Data Analysis" , "React"] 

marks_series = pd.Series(data,index=subject)
marks_series

Math                NaN
System design       NaN
Cloud Computing     NaN
Data Analysis       NaN
React              90.0
dtype: float64

In [10]:
# Error
index=['Apple', 'Banana', 'Orange']
quantity = [34, 20, 30, 40]
# Uncomment to see error 👇🏿.
# pd.Series(data=quantity, index=index)

In [11]:
dict={'A':30, 'B':40, 'C':50}
index=['A', 'B', 'D']
pd.Series(data=dict, index=index)

A    30.0
B    40.0
D     NaN
dtype: float64

### 3.2 Manipulating Series
1. __To check for null values using `.notnull`__

In [12]:
marks_series.notnull()

Math               False
System design      False
Cloud Computing    False
Data Analysis      False
React               True
dtype: bool

- `True` indicates that the value is not null.
3. __To know the subjects in which marks score is more than 75__

In [13]:
marks_series[marks_series > 75]

React    90.0
dtype: float64

4. ___To assign 68 marks to 'Art and Craft'__

In [16]:
marks_series["Math"] = 91
# or
#  mark
marks_series

Math               91.0
System design       NaN
Cloud Computing     NaN
Data Analysis       NaN
React              91.0
dtype: float64

In [17]:
# Compare values
marks_series["Math"] == 75
# OR
marks_series.Math == 75

False

5. __Sorting Numeric Series__

In [18]:
import numpy as np

In [19]:
values = pd.Series([23,np.nan,45,np.nan,56,67,34,23])
values

0    23.0
1     NaN
2    45.0
3     NaN
4    56.0
5    67.0
6    34.0
7    23.0
dtype: float64

In [20]:
#ascending Order
values.sort_values(ascending = True)
#descending Order
values.sort_values(ascending = False)

5    67.0
4    56.0
2    45.0
6    34.0
0    23.0
7    23.0
1     NaN
3     NaN
dtype: float64

7. __Sorting Categorical Series__

In [21]:
# create a pandas series 
string_values = pd.Series(["a", "f", "j", "d", "c"])
string_values
# since the computer stores strings in lexigraphical order
# sort_values maintains the indices of all the elements of the array

0    a
1    f
2    j
3    d
4    c
dtype: object

In [22]:
# ascending order
string_values.sort_values(ascending=True)

0    a
4    c
3    d
1    f
2    j
dtype: object

In [23]:
data=range(10)
new_ser=pd.Series(data=data)
new_ser[new_ser==5]

5    5
dtype: int64

In [24]:
marks_series.rank(ascending=True)

Math               1.5
System design      NaN
Cloud Computing    NaN
Data Analysis      NaN
React              1.5
dtype: float64

# Panda Data Frames
- Is a tabular representation of data containing an ordered collectin, each of which can ve a different type (numeric , string,boolean and so on)
- The DataFrame has both row and column index; it can be thought of as a dict of Series all sharing the same index.In a DF, the data is stored as one or more two-dimensioanl blocks rather than a list,dict or some other collection of one-dimensional arrays.
- While a DF is physically two-dimensional, it can be use to represent higher dimensional data in  tabular   format usung hierarchical indexing

- __4.1 Creating a DataFrame__

1. __Creating a DataFrame from a dictionary__

In [25]:
data ={
    'Subject':["React","Rust","Golang","Elixir"],
    'Marks':(89,34,65,78), 
    'CGPA':[2.5,3.0,4.5,5.6]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Subject,Marks,CGPA
0,React,89,2.5
1,Rust,34,3.0
2,Golang,65,4.5
3,Elixir,78,5.6
