# About Dataframe

A **DataFrame** is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

* A Pandas DataFrame will be created by loading the datasets from existing storage. 
* Storage can be SQL Database, CSV file, Excel file, etc. 
* It can also be created from the lists, dictionaries, and from a list of dictionaries.

**Series** represents a one-dimensional array of indexed data.
It has two main components :
1. An array of actual data.
2. An associated array of indexes or data labels.

The index is used to access individual data values. You can also get a column of a dataframe as a **Series**. You can think of a Pandas series as a 1-D dataframe. 

In [2]:
import pandas as pd

In [3]:
#Create a new dictionary, and convert it to a dataframe

x = {'Name': ['Rose','John', 'Jane', 'Mary'], 'ID': [1, 2, 3, 4], 'Department': ['Architect Group', 'Software Group', 'Architect Group', 'Infrastructure'], 
      'Salary':[100000, 80000, 50000, 60000]}
#name, id, department, and salary are the columns/objects
#the value on its list is the observation

#cast the dictionary to a dataframe, with customize index
df = pd.DataFrame(x,index = range(1,5))

#print the result
df

Unnamed: 0,Name,ID,Department,Salary
1,Rose,1,Architect Group,100000
2,John,2,Software Group,80000
3,Jane,3,Architect Group,50000
4,Mary,4,Infrastructure,60000


In [4]:
#create a subset columns to a new dataframe from the old one

df2 = df[['Name','ID']]
df2
#we need 2 kurung siku, because the first is to slice the dataframe, and the second is to make it as a list

Unnamed: 0,Name,ID
1,Rose,1
2,John,2
3,Jane,3
4,Mary,4


In [10]:
#acces and slicing the value by loc and iloc function
#loc = colnames, iloc = colindex

#accesing
z = df.iloc[0,2]
print(z)
print('=======')

#Slicing
a = df.iloc[0:3,1:3]
print(a)

print('=======')

b = df.loc[0:3,['ID','Department']]
print(b)

print('=======')

c = df.iloc[:,3]
print(c)
#Notes! Just empty the first argument, if want to select all rows

Architect Group
   ID       Department
1   1  Architect Group
2   2   Software Group
3   3  Architect Group
   ID       Department
1   1  Architect Group
2   2   Software Group
3   3  Architect Group
1    100000
2     80000
3     50000
4     60000
Name: Salary, dtype: int64


In [None]:
#change the index into once of dictionary values
df3 = df.set_index('Name')
df3

In [None]:
#change the index to a list or range provided
x = [1,2,3,4]
df4 = df
df4.index = x
df4

In [None]:
#finding some unique element
dunique = df['Department'].unique()
print(dunique)

In [None]:
#filtering conditional
filtered = df[df['ID']>2]
filtered

DataFrames provide numerous attributes and methods for data manipulation and analysis, including:

* shape: Returns the dimensions (number of rows and columns) of the DataFrame.
* info(): Provides a summary of the DataFrame, including data types and non-null counts.
* describe(): Generates summary statistics for numerical columns.
* head(), tail(): Displays the first or last n rows of the DataFrame.
* mean(), sum(), min(), max(): Calculate summary statistics for columns.
* sort_values(): Sort the DataFrame by one or more columns.
* groupby(): Group data based on specific columns for aggregation* .
* fillna(), drop(), rename(): Handle missing values, drop columns, or rename columns.
* apply(): Apply a function to each element, row, or column of the DataFrame.