### <center style = 'color : red '>  PANDAS : Python Data Analysis Library </center>
مكتبة بايثون لتحليل البيانات 

The Pandas library is built on NumPy and provides easy-to-use
data structures and data analysis tools for the Python
programming language

مكتبة بانداس هي مبنية على مكتبة نامباي و توفر هياكل بنيوية سهلة الاستعمال و أدوات لتحليل البيانات من أجل البرمجة باستعمال لغة بايثون

https://pandas.pydata.org/

In [None]:
import numpy as np
import pandas as pd

* The **pandas series** object can be seen as an enhanced numpy 1D array and the **pandas dataframe** can be seen as an enhanced numpy 2D array. 

يمكن اعتبار السلاسل في بانداس كمصفوفة نامباي مدعمة و اطار البيانات كمصفوفة نامباي من الدرجة الثانية و التي هي الاخرى محسنة 

* The main difference is that pandas series and pandas dataframes has explicit index, while numpy arrays has implicit indexation. So, in any python code that you think to use something like

الفرق الاهم هو كون السلاسل تتوفر على تاشير صريح بينما المصفوفات في نامباي هي تتوفر فقط على تاشير ضمني

```python
import numpy as np
a = np.array([1,2,3])
```
you can just use
```python
import pandas as pd
a = pd.Series([1,2,3])
```

All the functions and methods from numpy arrays will work with pandas series. 

مختلف الوظائف في نامباي يمكن استعمالها في بانداس


is there any performance differences between a numpy array and pandas series ?
هل هنالك فرق في الاداء بين نامباي و بانداس؟

in fact Pandas is much slower than NumPy, because Pandas is doing a lot of stuff when you index into a Series, and it’s doing that stuff in Python

## <center style = "color:blue"> series </center>

A one-dimensional labeled array capable of holding any data type 
مصفوفة احادية البعد قادرة على احتواء اي نوع من البيانات 

* pandas create a default integer index بانداس يخلق قهرس رقمي بشكل تلقائي

In [None]:
s = pd.Series(
    [3,-2,1,4]
)

In [None]:
s

In [None]:
type(s)

In [None]:
# Get one element
s[0]

### index of string - استعمال فهرس غير رقمي

In [None]:
s = pd.Series(
    [3,-2,1,4], 
    index=['a', 'b', 'c', 'd']
)

In [None]:
s['a'] 

### index of dates - فهرس من التواريخ

In [None]:
dates = pd.date_range('20190301', periods=6)
dates

In [None]:
s = pd.Series(np.random.randn(6), 
                  index=dates)
s


### Boolean Indexing

In [None]:
s = pd.Series([3,-2,1,4])
s

In [None]:
s[3]

In [None]:
# Series s where value is >1
s[s > 1] 

In [None]:
# Series s where value is not >1
s[~(s > 1)] 

In [None]:
# s where value is <-1 or >2
s[(s < -1) | (s >= 3)] 

In [None]:
# Setting
s[2] = -6 # Set index a of Series s to 6

### <center style = 'color : blue '> Data Alignment </center>

In [None]:
s = pd.Series(
    [3,-2,1,4,1], 
    index=['a', 'b', 'c', 'd','e']
)

s

In [None]:
s2 = pd.Series([-2, 7, 3, -5, 1], index=['a', 'c', 'd','e','f'])
s2

### add

In [None]:
s.add(s2)

In [None]:
s.add(s2, fill_value=0)

### sub

In [None]:
s.sub(s2, fill_value=0)

In [None]:
s.sub(s2,fill_value=2)

### div

In [None]:
s.div(s2, fill_value=4)

### mul 

In [None]:
s.mul(s2, fill_value=0)

### Drop a values - حذف قيم

In [None]:
s

In [None]:
# Drop values from rows (axis=0)
s.drop(['a', 'c']) 

## <center style = "color:blue"> Data Frame - إطار البيانات </center>

A two-dimensional labeled data structure with columns of potentially different types 
هيكل بيانات ثنائي الابعاد باعمدة قد تحتوي على بيانات من انواع مختلفة

### Create a dataframe from a dictionary -   خلق إطار البيانات

In [None]:
data = {
    'Country': ['Morocco', 'China', 'France','Indonesia','Spain'],
    'Capital': ['Rabat', 'Beijing', 'Paris','Jakarta','Madrid'],
    'Population': [30000000, 1000000000, 80000000,130000000,75000000]
}

df = pd.DataFrame(
    data,
    columns=['Country', 'Capital', 'Population']
)

In [None]:
df

In [None]:
df.head(2) # الحصول على اول الصفوف

In [None]:
df.tail(3) # الحصول على اخر الصفوف

In [None]:
df.info() # Info on DataFrame معلومات عن 

In [None]:
print(df.index) # الحصول على الفهرس 
print(df.columns) # لائحة الاعمدة

In [None]:
print('shape : ', df.shape) # (rows,columns) شكل الهيكل

In [None]:
df.describe() # Summary statistics - احصاءات

### Data frame selection & filtering - التصفية

In [None]:
df['Country']

In [None]:
df[['Country','Population']] # اخد عمودين فقط

In [None]:
# boolean filtering -  عبارة منطقية او بولية
df[df['Population']>60000000] # Use filter to adjust DataFrame 

In [None]:
# Getting element  
df[:] # Get subset of a DataFram - الحصول على مجموعة ضمنية

In [None]:
# Getting element  
df[1:3] # Get subset of a DataFram

-----

In [None]:
df[1:3,2] # attention : error

In [None]:
df[1:,['Capital']] # attention : error

### loc (locate) - حدد

In [None]:
df.loc[0]

In [None]:
# By Label
df.loc[[0], ['Country']] #Select single value by row &column labels


In [None]:
# By Label
df.loc[:, ['Population']] #Select single value by row &column labels

#### iloc (index location) - التحديد باستعمال التاشير

In [None]:
# By Position
df.iloc[0,0] # Select single value by row & column

In [None]:
# By Position
df.iloc[1:4,[0,1]] # Select single value by row & column


### Dropping columns/rows - حذف عمود او صف

In [None]:
#Dropping
df.drop('Capital', axis=1) # Drop values from columns(axis=1)

In [None]:
df.drop('Capital', axis=1, inplace=True) 

In [None]:
df

### Sort & Rank - الفرز و الترتيب

In [None]:
df.sort_index() #Sort by labels along an axis

In [None]:
# Sort by the values along an axis
df.sort_values(by='Population') # by = 'Country'

In [None]:
# Sort by the values along an axis
df.sort_values(by = 'Country') # 

In [None]:
# Assign ranks to entries
df.rank() 

In [None]:
df['Population_rank'] = df['Population'].rank()
df

### Summary

### sum - المجموع

In [None]:
df.sum() # Sum of values

### cumsum - الجمع التراكمي

In [None]:
df.cumsum() # Cummulative sum of values

###  min and max

In [None]:
print(df["Population"].min())
df["Population"].max() # Minimum/maximum values


### how much percentage does chin'as population represents in our dataset
### كم تمثل ساكنة الصين بالنسبة للساكنة في قاعدة بياناتنا

In [None]:
df[df['Country']=='China']['Population'] / df['Population'].cumsum().max()

### idxmin and idxmax - الحصول على تاشير اكبر و اصغر قيمة

In [None]:
df

In [None]:
df['Population'].idxmin()

In [None]:
df['Population'].idxmax() # Minimum/Maximum index value

## mean & median

In [None]:
df.mean() # Mean of values


In [None]:
df.median() # Median of values

### <center style='color:blue'> Applying Functions - تطبيق دالة</center>


In [92]:
f = lambda x: x*2

In [93]:
df.apply(f) # Apply function

Unnamed: 0,Country,Capital,Population,Population_rank
0,MoroccoMorocco,RabatRabat,60000000,2.0
1,ChinaChina,BeijingBeijing,2000000000,10.0
2,FranceFrance,ParisParis,160000000,6.0
3,IndonesiaIndonesia,JakartaJakarta,260000000,8.0
4,SpainSpain,MadridMadrid,150000000,4.0


In [94]:
df.applymap(f) # Apply function element-wise

Unnamed: 0,Country,Capital,Population,Population_rank
0,MoroccoMorocco,RabatRabat,60000000,2.0
1,ChinaChina,BeijingBeijing,2000000000,10.0
2,FranceFrance,ParisParis,160000000,6.0
3,IndonesiaIndonesia,JakartaJakarta,260000000,8.0
4,SpainSpain,MadridMadrid,150000000,4.0


In [95]:
df['Population'].apply(lambda x:x/3+1) # Apply function

0    1.000000e+07
1    3.333333e+08
2    2.666667e+07
3    4.333333e+07
4    2.500000e+07
Name: Population, dtype: float64

In [96]:
df['Population'] # didn't change ?!

0      30000000
1    1000000000
2      80000000
3     130000000
4      75000000
Name: Population, dtype: int64

### create a new column - انشائ عمود جديد

In [99]:
df['Population'] = df['Population'].apply(lambda x:x/3+1) # Apply function

In [100]:
df

Unnamed: 0,Country,Capital,Population,Population_rank,new_column
0,Morocco,Rabat,10000000.0,1.0,10000000.0
1,China,Beijing,333333300.0,5.0,333333300.0
2,France,Paris,26666670.0,3.0,26666670.0
3,Indonesia,Jakarta,43333330.0,4.0,43333330.0
4,Spain,Madrid,25000000.0,2.0,25000000.0


### <center style='color:blue'> create dataframe from a file - الانشائ انطلاقا من ملف </center> 

### csv file 

In [101]:
df = pd.read_csv("data\\Fire_Department_Calls_for_Service_03_23_2019.csv",
                sep = ',')
df.head()


Unnamed: 0,Call Number,Unit ID,Incident Number,Call Type,Call Date,Watch Date,Received DtTm,Entry DtTm,Dispatch DtTm,Response DtTm,...,ALS Unit,Call Type Group,Number of Alarms,Unit Type,Unit sequence in call dispatch,Fire Prevention District,Supervisor District,Neighborhooods - Analysis Boundaries,Location,RowID
0,190824304,KM11,19035264,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:58:14 PM,03/24/2019 12:00:09 AM,03/24/2019 12:00:36 AM,03/24/2019 12:01:01 AM,...,False,Non Life-threatening,1,PRIVATE,1,2,5,Western Addition,"(37.78200642291339, -122.42822947127583)",190824304-KM11
1,190824295,KM03,19035263,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:56:34 PM,03/23/2019 11:56:34 PM,03/23/2019 11:56:59 PM,03/23/2019 11:57:47 PM,...,False,Potentially Life-Threatening,1,PRIVATE,2,1,3,Financial District/South Beach,"(37.7980449492818, -122.3963670843851)",190824295-KM03
2,190824295,E28,19035263,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:56:34 PM,03/23/2019 11:56:34 PM,03/23/2019 11:56:59 PM,03/23/2019 11:57:46 PM,...,False,Potentially Life-Threatening,1,ENGINE,1,1,3,Financial District/South Beach,"(37.7980449492818, -122.3963670843851)",190824295-E28
3,190824278,59,19035262,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:48:04 PM,03/23/2019 11:50:41 PM,03/23/2019 11:50:59 PM,03/23/2019 11:51:18 PM,...,True,Non Life-threatening,1,MEDIC,1,9,11,Excelsior,"(37.709059057853175, -122.45174696338128)",190824278-59
4,190824270,E29,19035261,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:48:21 PM,03/23/2019 11:49:36 PM,03/23/2019 11:50:31 PM,03/23/2019 11:51:45 PM,...,True,Non Life-threatening,1,ENGINE,1,2,6,South of Market,"(37.77128552595309, -122.41320426710206)",190824270-E29


### excel file

In [102]:
df = pd.read_excel("data\\Fire_Department_Calls_for_Service_03_23_2019.xlsx",
                  sheet_name='Sheet1',
                   #index_col=None, 
                   #header=None,
                   dtype={'Name': str, 'Value': float})
df.head(2)

Unnamed: 0,Call Number,Unit ID,Incident Number,Call Type,Call Date,Watch Date,Received DtTm,Entry DtTm,Dispatch DtTm,Response DtTm,...,ALS Unit,Call Type Group,Number of Alarms,Unit Type,Unit sequence in call dispatch,Fire Prevention District,Supervisor District,Neighborhooods - Analysis Boundaries,Location,RowID
0,190824304,KM11,19035264,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:58:14 PM,03/24/2019 12:00:09 AM,03/24/2019 12:00:36 AM,03/24/2019 12:01:01 AM,...,False,Non Life-threatening,1,PRIVATE,1,2,5,Western Addition,"(37.78200642291339, -122.42822947127583)",190824304-KM11
1,190824295,KM03,19035263,Medical Incident,03/23/2019,03/23/2019,03/23/2019 11:56:34 PM,03/23/2019 11:56:34 PM,03/23/2019 11:56:59 PM,03/23/2019 11:57:47 PM,...,False,Potentially Life-Threatening,1,PRIVATE,2,1,3,Financial District/South Beach,"(37.7980449492818, -122.3963670843851)",190824295-KM03


### <center style='color:blue'>DataFrame to Numpy NDArray </center>
### <center style='color:blue'> من بانداس الى نامباي </center>

In [103]:
a = df.values

In [104]:
type(a)

numpy.ndarray

In [105]:
a

array([[190824304, 'KM11', 19035264, ..., 'Western Addition',
        '(37.78200642291339, -122.42822947127583)', '190824304-KM11'],
       [190824295, 'KM03', 19035263, ...,
        'Financial District/South Beach',
        '(37.7980449492818, -122.3963670843851)', '190824295-KM03'],
       [190824295, 'E28', 19035263, ...,
        'Financial District/South Beach',
        '(37.7980449492818, -122.3963670843851)', '190824295-E28'],
       ...,
       [190820023, 'E38', 19034833, ..., 'Pacific Heights',
        '(37.786458147838374, -122.43077642337992)', '190820023-E38'],
       [190820010, 'QRV1', 19034832, ...,
        'Financial District/South Beach',
        '(37.784660412196764, -122.39991472915987)', '190820010-QRV1'],
       [190820010, 84, 19034832, ..., 'Financial District/South Beach',
        '(37.784660412196764, -122.39991472915987)', '190820010-84']],
      dtype=object)

RSC:
* https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf
* https://pandas.pydata.org/pandas-docs/version/0.22.0/10min.html
* https://realpython.com/python-data-cleaning-numpy-pandas/