# Pandas Data Frames
## 1. Creating a Pandas DataFrame
A DataFrame is a two-dimensional, labeled data structure in Pandas.
It is similar to an Excel spreadsheet or SQL table.

In [35]:
import numpy as np
import pandas as pd

In [95]:
from numpy.random import randn
np.random.seed(101)

In [97]:
# Creating 
df_rnd=pd.DataFrame(data=randn(5,4),index=['A','B','C','D','E'],columns=['W','X','Y','Z'])

In [99]:
df_rnd

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


## Creating a DataFrame from a dictionary
We create dataframe using a dictionary with scalar (single) values and a dictionary of lists.

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)


In [72]:
my_dict={"Name":"Sam","Age":40,"Salary":10000}

In [74]:
df=pd.DataFrame(data=my_dict,index=[0])

In [76]:
df

Unnamed: 0,Name,Age,Salary
0,Sam,40,10000


In [78]:
my_dict={'Name':['Beryl','Kath','Sam','Mandy'],
        'Age':[25,30,40,23],
        'Salary':[10000,20000,35000,12000],}

In [80]:
df=pd.DataFrame(data=my_dict)

In [82]:
df

Unnamed: 0,Name,Age,Salary
0,Beryl,25,10000
1,Kath,30,20000
2,Sam,40,35000
3,Mandy,23,12000


In [84]:
# Display the DataFrame

display(df)

Unnamed: 0,Name,Age,Salary
0,Beryl,25,10000
1,Kath,30,20000
2,Sam,40,35000
3,Mandy,23,12000


##  Accessing Data
You can access specific columns, rows, or values in a DataFrame.

In [117]:
# Access a single column
df_rnd['W']

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

In [121]:
display(df['Name'])


0    Beryl
1     Kath
2      Sam
3    Mandy
Name: Name, dtype: object

In [108]:
type(df_rnd['W'])

pandas.core.series.Series

In [110]:
type(df)

pandas.core.frame.DataFrame

In [123]:
# Access multiple columns
display(df[['Name', 'Salary']])

Unnamed: 0,Name,Salary
0,Beryl,10000
1,Kath,20000
2,Sam,35000
3,Mandy,12000


In [129]:

# Access a specific row by index
display(df.iloc[1])

Name       Kath
Age          30
Salary    20000
Name: 1, dtype: object

## 4. Accessing Columns - Best Practices
When accessing a single column in a DataFrame, you might see both of these notations:

### 1️ Using dot notation (not recommended)
```
df.W
```
While this works, it is not the best practice because:
- It can cause confusion with built-in methods of Pandas DataFrames.
- If a column name conflicts with an existing method, unexpected behavior may occur.

### 2️ Using bracket notation (recommended)
```
df['W']
```
This is the preferred way because:
- It avoids conflicts with DataFrame methods.
- It is more explicit and readable.

For multiple columns, always use double brackets:
```
df[['W', 'X']]
```
This ensures a DataFrame is returned instead of a Series.

In [112]:
df_rnd.W

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

However with this method Python may gets confused, cause there are a bunch of methods available after `df.` the best way is to use `[]` notation when requesting a column name

In [135]:
df_rnd[['W', 'X']]

Unnamed: 0,W,X
A,2.70685,0.628133
B,0.651118,-0.319318
C,-2.018168,0.740122
D,0.188695,-0.758872
E,0.190794,1.978757
