# Dataframes :

- A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
- It can be thought of as a table or a spreadsheet, with rows and columns.
- Each column in a DataFrame is a Series.
- DataFrames can be created from dictionaries, lists of dictionaries, NumPy arrays, or other DataFrames.
- DataFrames support a wide range of operations including indexing, selection, filtering, merging, joining, and more.

In [3]:
dict1 = {'column1' : [1,2,3,4], 'column2':['Pwskills', 'Pw', 'PWIOI', 'PhysicsWala']}

In [5]:
import pandas as pd

df = pd.DataFrame(dict1)
df

Unnamed: 0,column1,column2
0,1,Pwskills
1,2,Pw
2,3,PWIOI
3,4,PhysicsWala


In [6]:
df['column1']

0    1
1    2
2    3
3    4
Name: column1, dtype: int64

In [10]:
type(df)

pandas.core.frame.DataFrame

In [7]:
type(df['column1'])

pandas.core.series.Series

### Adding a new column

In [9]:
df['column3'] = [10, 20, 30, 50]
df

Unnamed: 0,column1,column2,column3
0,1,Pwskills,10
1,2,Pw,20
2,3,PWIOI,30
3,4,PhysicsWala,50


In [11]:
df.iloc[2,2] # integer based indexing

30

In [12]:
df.iloc[0:2, 0:3]

Unnamed: 0,column1,column2,column3
0,1,Pwskills,10
1,2,Pw,20


In [14]:
df.loc[1, 'column3'] # label based indexing

20

In [15]:
df.loc[1, 3] # passing integer at second parameter causes an error

KeyError: 3

In [16]:
df.loc[1:2, 'column2':'column3']

Unnamed: 0,column2,column3
1,Pw,20
2,PWIOI,30


## iloc vs loc

### iloc :

- Stands for "integer location".
- Allows selection of rows and columns by integer index.
- The syntax is ```DataFrame.iloc[row_index, column_index]```.
- Integer indexing starts from 0.
- Exclusive on the right side (like Python slicing).
- It cannot handle non-numeric labels since it relies on integer positions

### loc :

- Stands for "location".
- Allows selection of rows and columns by label.
- The syntax is ```DataFrame.loc[row_label, column_label]```.
- Label-based indexing.
- Inclusive on both sides.
- It is designed to handle 

In [17]:
list1 = [11, 44, 66, 22]
list2 = ['Alex', 'Sind', 'Riya', 'Sim']
list(zip(list1,list2))

[(11, 'Alex'), (44, 'Sind'), (66, 'Riya'), (22, 'Sim')]

In [21]:
df2 = pd.DataFrame(list(zip(list1,list2)), columns = ['Marks', 'StudentName'])
df2

Unnamed: 0,Marks,StudentName
0,11,Alex
1,44,Sind
2,66,Riya
3,22,Sim


In [22]:
print('Concatenation along rows:')
new_df =  pd.concat([df,df2])
new_df

Concatenation along rows:


Unnamed: 0,column1,column2,column3,Marks,StudentName
0,1.0,Pwskills,10.0,,
1,2.0,Pw,20.0,,
2,3.0,PWIOI,30.0,,
3,4.0,PhysicsWala,50.0,,
0,,,,11.0,Alex
1,,,,44.0,Sind
2,,,,66.0,Riya
3,,,,22.0,Sim


In [23]:
print('Concatenation along columns:')
new_df2 = pd.concat([df,df2], axis=1)
new_df2

Concatenation along columns:


Unnamed: 0,column1,column2,column3,Marks,StudentName
0,1,Pwskills,10,11,Alex
1,2,Pw,20,44,Sind
2,3,PWIOI,30,66,Riya
3,4,PhysicsWala,50,22,Sim
