# Pandas Basic Tutorial

* **Pandas** : 
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool especially for **Table data** and **Time-series data**
![212.png](attachment:212.png)
![2.png](attachment:2.png)

## Content
1. [Import Libraries](#one)
2. [Pandas Data Structure](#two)
3. [CSV, Excel and SQL databases](#three)
4. [Exporting Data](#four)
5. [Create Test Objects](#five)
6. [Summarize Data](#six)
7. [Selection & Filtering](#seven)
8. [Sort Data](#eight)
9. [Rename & Defining New & Change columns](#nine)
10. [Drop Data](#ten)
11. [Convert Data Types](#eleven)
<hr>

How to use this notebook :

There is only minimum explanation

This notebook could be helpful for who want to see how code works right away

Please upvote if it was helpful !! 
<hr>

<a id="one"></a>

# 1. Import Libraries

<hr>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import os
print(os.listdir("../input/stocknews"))

<hr>

<a id="two"></a>

# 2. Pandas Data Structure

Pandas has two types of data structures

1. **Series** - one dimensional labeled array
![141.PNG](attachment:141.PNG)
2. **DataFrame** - two dimensional data 
![3211.jpg](attachment:3211.jpg)

<hr>

## Series ###

In [None]:
Series = pd.Series([3,-5,7,4], index = ['a','b','c','d'])


In [None]:
type(Series)

In [None]:
Series

<hr>

## DataFrame 

In [None]:
data = {'Country' : ['Belgium','India','Brazil'],
        'Capital' : ['Brussels','New Delhi','Brassilia'],
        'Population' : [1234,1234,1234]}
datas= pd.DataFrame(data,columns=['Country','Capital','Population'])

In [None]:
print(type(data))
print(type(datas))

In [None]:
dictionary={"Name":["John","James","Awi","Kewi"],
           "Age":[15,16,17,18]}
print(dictionary)

In [None]:
data_dict = pd.DataFrame(data=dictionary,index=range(4),columns=["Name","Age"])
print(data_dict)

In [None]:
dict_new={"Name":["King","Arthur","Jurdi","Hirdi"],
           "Age":[25,35,45,55]}
dict_new=pd.DataFrame(data=dict_new,index=range(4),columns=["Name","Age"])
print(dict_new)


In [None]:
data_dict = pd.concat([data_dict,dict_new],axis = 0, ignore_index=True)
print(data_dict)

<hr>

<a id="three"></a>

# 3. CSV, Excel and SQL databases
With pandas, we can open CSV, Excel and SQL databases

<hr>

## CSV

In [None]:
df = pd.read_csv("../input/stocknews/upload_DJIA_table.csv")
type(df)

In [None]:
df.head()

<hr>

## Excel

In [None]:
# pd.read_excel('filename')
# pd.to_excel('dir/dataFrame.xlsx', sheet_name='Sheet1')

<hr>

## Others (json,SQL,table,html)

In [None]:
# pd.read_sql(query,connection_object) -> Reads from a SQL table/database
# pd.read_table(filename) -> From a delimited text file(like TSV)
# pd.read_json(json_string) -> Reads from a json formatted string, URL or file
# pd.read_html(url) -> Parses an html URL, string or file and extracts tables to a list of dataframes
# pd.read_clipboard() -> Takes the contentes of your clipboard and passes it to read_table()
# pd.DataFrame(dict) -> From a dict, keys for columns names, values for data as lists

<hr>

<a id="four"></a>

# 4. Exporting Data

In [None]:
# df.to_csv(filename) -> Writes to a CSV file
# df.to_excel(filename) -> Writes on an Excel file
# df.to_sql(table_name, connection_object) -> Writes to a SQL table
# df.to_json(filename) -> Writes to a file in JSON format
# df.to_html(filename) -> Saves as an HTML table
# df.to_clipboard() -> Writes to the clipboard

<hr>

<a id="five"></a>

# 5. Create Test Objects

In [None]:
pd.DataFrame(np.random.rand(20,5)) # row , column

<hr>

<a id="six"></a>

# 6. Summarize Data

<hr>

## df.info()
This Code provides detailed information about our data.


In [None]:
df.info()

<hr>

## df.shape()
This code shows us the number of rows and columns.

In [None]:
df.shape

<hr>

## df.index
This code shows the total number of index found.

In [None]:
df.index

<hr>

## df.columns
This code shows all the columns contained in the data we have examined.

In [None]:
df.columns

In [None]:
for col in df.columns:
    print(col)

<hr>

## df.count()
This code shows us how many pieces of data are in each column.

In [None]:
df.count()

<hr>

## df.sum()
This code shows us the sum of the data in each column.

In [None]:
df.sum()

In [None]:
df.sample(3)

<hr>

## df.cumsum()
This code gives us cumulative sum of the data.

In [None]:
df.cumsum().head()

<hr>

## df.min()
This code brings us the smallest of the data.

In [None]:
df.min()

<hr>

## df.max()
This code brings up the largest among the data.

In [None]:
df.max()

<hr>

## idxmin()
This code fetches the smallest value in the data,
The use on series and dataframe is different


In [None]:
print("df: ",df['Open'].idxmin())
print("series", Series.idxmin())

<hr>

## idxmax()
This code returns the largest value in the data.

In [None]:
print("df: ",df['Open'].idxmax())
print("series: ",Series.idxmax())

<hr>

## df.describe()

This code provides basic statistical information about the data. 
The numerical column is based

In [None]:
df.describe()

In [None]:
df[['Open','High','Low']].describe()

<hr>

## df.mean()
This code returns the mean value for the numeric column.

In [None]:
df.mean()

<hr>

## df.median()
This code returns median for columns with numeric values.

In [None]:
df.median()

<hr>

## df.quantile([0.25,0.75])
This code calculates the values 0.25 and 0.75 of the columns for each column.

In [None]:
df.quantile([0.25,0.75])

In [None]:
df.quantile([0.5])

<hr>

## df.var()
This code calculates the variance value for each column with a numeric value.

In [None]:
df.var()

<hr>

## df.std()

This code calculates the standard deviation value for each column with numeric value.

In [None]:
df.std()

<hr>

## df.cummax()

This code calculates the cumulative max value between the data.

In [None]:
df.cummax()

<hr>

## df.cummin()
This code returns the cumulative min value of the data.


In [None]:
df.cummin()

<hr>

## df['columnName'].cumproad()
This code returns the cumulative production of the data.

In [None]:
df['Open'].cumprod().head()

<hr>

## len(df)
This code gives you how many data there is.

In [None]:
len(df)

<hr>

## df.isnull()
Checks for null values, returns boolean.

In [None]:
df.isnull().head()

In [None]:
df.isnull().sum()

<hr>

## df.corr()

It gives information about the correlation between the data.

In [None]:
df.corr()

In [None]:
import seaborn as sns
sns.heatmap(df.corr(),annot=True)
plt.show()

<hr>

<a id="seven"></a>

# 7. Selection & Filtering

<hr>

## Series['b']
This code returns data with a value of B in series.

In [None]:
Series['b']

<hr>

## df[n:n]
This code fetches data from N to N.

In [None]:
df[1982:]

<hr>

## df.iloc[[n],[n]]
This code brings the data in the N row and N column in the DataFrame.

In [None]:
df.iloc[[0],[3]]

<hr>

## df.loc[n:n]
This code allows us to fetch the data in the range we specify.

In [None]:
df.loc[5:7]

<hr>

## df['columnName'].nunique()
This code shows how many of the data that is not repeated.

In [None]:
df['Open'].nunique()

<hr>

## df['columnName'].unique()
This code shows which of the data is repeated.

In [None]:
df['Open'].unique()

<hr>

## df.sample(frac=0.5)
This code selects the fractions of random rows and fetches the data to that extent.

In [None]:
df.sample(frac=0.5).head()

<hr>

## df.nlargest(n,'columnName')
This code brings N from the column where we have specified the largest data


In [None]:
df.nlargest(5,'Open')

<hr>

## df.nsmallest(n,'columnName')
This code brings N from the column where we have specified the smallest data.

In [None]:
df.nsmallest(3,'Open')

<hr>

## df[df.columnNAME <  5]
This code returns the column name we have specified, which is less than 5.

In [None]:
df[df.Open > 18281.949219]

<hr>

## Create Filter

In [None]:
filters = df.Date > '2016-06-27'
df[filters]

<hr>

## df.filter(regex='code')
This code allows regex to filter any data we want.  

regex = regular expression

In [None]:
df.filter(regex='^L').head()

<hr>

<a id="eight"></a>

# 8. Sort Data

<hr>

## df.sort_values('columnName')
This code sorts the column we specify in the form of low to high.

In [None]:
df.sort_values('Open').head()

<hr>

## df.sort_values('columnName',ascending=False)
This code is the column we specify in the form of high to low.

In [None]:
df.sort_values(by='Date', ascending=True).head()

<hr>

## df.sort_index()
This code sorts from small to larger according to the DataFrame index.

In [None]:
df.sort_index().head()

<hr>

<a id="nine"></a>

# 9. Rename & Defining New & Change columns

<hr>

## df.rename(columns={'columnName':newColumnName'})

In [None]:
df.rename(columns= {'Adj Close' : 'Adjclose'}).head()

<hr>

## Defining New Column
Create a new column

In [None]:
df["Difference"] = df.High - df.Low
df.head()

<hr>

## Change Index Name
Change index name to new index name

In [None]:
print(df.index.name)
df.index.name = "index_name"
df.head()

<hr>

<a id="ten"></a>

# 10. Drop Data

<hr>

## df.drop(columns=['columnName']

In [None]:
df.drop(columns=['Adj Close']).head()

<hr>

## mySeries.drop(['a'])
This code allows us to delete the value specified in the series.

In [None]:
Series.drop(['a'])

<hr>

<a id="eleven"></a>

# 11. Convert Data Types

<hr>

## df.dtypes
This code shows what data type of columns are. Boolean,int,float,object(String),data and categorical.

In [None]:
df.dtypes

<hr>

## df['columnName'] = df['columnName'].astype('dataType') 
This code convert the column we specify into the data type we specify.

In [None]:
df.Date.astype('category').dtypes


In [None]:
df.head()

<hr>

## pd.melt(frame=dataFrameName,id_vars = 'columnName', value_vars= ['columnName']) 
This code is confusing, so lets look at the example.

In [None]:

df_new = df.head()
melted = pd.melt(frame=df_new,id_vars = 'Date', value_vars= ['Low'])
melted

In [None]:
df_new_1 = df.tail()
melted = pd.melt(frame=df_new_1,id_vars = 'Date', value_vars= ['Low','Close','High'])
melted

<hr>

## Reference

- Pandas Tutorial For Beginners
(https://www.kaggle.com/kralmachine/pandas-tutorial-for-beginners#Sort-Data-)
- Wikipedia Pandas
(https://en.wikipedia.org/wiki/Pandas_(software))]