DataFrame

In [None]:
'''
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
Series is like a column, a DataFrame is the whole table.

DataFrame Homogeneity:
1. A DataFrame can be heterogeneous across columns but homogeneous within each column
2. Each column in a DataFrame behaves like a Series
3. DataFrame columns can have different dtypes (mixed types)
4. Common dtypes in DataFrame:
    - numeric (int64, float64)
    - object (strings)
    - boolean
    - datetime
5. Column-wise homogeneity is enforced, meaning all values in a single column must be of the same type
'''

In [1]:
import pandas as pd

In [2]:
data = {
  "x": [420, 380, 390],
  "y": [50, 40, 45]
}

df = pd.DataFrame(data)
print(df)

print(type(df))  # <class 'pandas.core.frame.DataFrame'>

     x   y
0  420  50
1  380  40
2  390  45
<class 'pandas.core.frame.DataFrame'>


In [3]:
data = {
    0 : [5,4,3,2],
    1 : [45,54,76,'c'],
    2 : [1,2,3,4]
}

df = pd.DataFrame(data,index = ['a','b','c','d'])
print(df)

print(df[1].dtype)

   0   1  2
a  5  45  1
b  4  54  2
c  3  76  3
d  2   c  4
object


In [None]:
'''
Features      |    .loc          |       .iloc
--------------|------------------|-----------------
Indexing      |  Lable-based     |   Integer-based
slicing       |  Includes end    |   Excludes end
works with    |  Named index     |   Numerical index 
Ex.           |  df.loc['a','b'] |   df.iloc[0,1]

'''

In [None]:
'''
.loc  (Label-Based Indexing) :
    It selects rows and columns based on explicit labels (row index names or column names).
    Syntax : 
        df.loc[row_label, column_label]

.iloc (Integer-Based Indexing):
    It selects rows and columns based on integer positions (0-based index).
    Syntax :
        df.iloc[row_index, column_index]
'''

In [4]:
# .loc

data = {
    '10' : [5,4,3,2],
    '11' : [45,54,76,'c'],
    '12' : [1,2,3,4]
}

df = pd.DataFrame(data,index = ['a','b','c','d'])

df.loc['a':'d']

Unnamed: 0,10,11,12
a,5,45,1
b,4,54,2
c,3,76,3
d,2,c,4


In [8]:
# .iloc

data = {
    0 : [5,4,3,2],
    1 : [45,54,76,'c'],
    2 : [1,2,3,4]
}

df = pd.DataFrame(data,index = ['a','b','c','d'])
print(df)

df.iloc[1:]

   0   1  2
a  5  45  1
b  4  54  2
c  3  76  3
d  2   c  4


Unnamed: 0,0,1,2
b,4,54,2
c,3,76,3
d,2,c,4
