## üêº What is Pandas?

- Pandas is an open-source data analysis library written in **Python**.
- It leverages the **power and speed of NumPy** to make data analysis and preprocessing easy for data scientists.
- It provides **rich and highly robust** data operations.

---

### üîß Pandas works well with:
- **NumPy**
- **Excel-like tabular data**
- **Python built-in structures** such as:
  - `list`
  - `dict`
  - `tuple`


## üìä Pandas Data Structure

Pandas has **two types of data structures**:

- **a) Series** ‚Äì It‚Äôs a one-dimensional array with indexes. It stores a single column or row of data in a DataFrame.

- **b) DataFrame** ‚Äì It‚Äôs a tabular spreadsheet-like structure representing rows, each of which contains one or multiple columns.

---

- A one-dimensional array (labeled) capable of holding any type of data ‚Äì **Series**

- A two-dimensional data (labeled) structure with columns of potentially different types of data ‚Äì **DataFrame**


In [1]:
import pandas as pd
import numpy as np

In [2]:
dict1 = {
    "name": ['miku','milan','karan'],
    "marks": [49,90,89],
    "city": ["Talcher","Cuttack","Angul"]
}

In [3]:
# it will create out dictionary into Data frame format
df = pd.DataFrame(dict1)

In [4]:
df

Unnamed: 0,name,marks,city
0,miku,49,Talcher
1,milan,90,Cuttack
2,karan,89,Angul


In [5]:
# it will create a csv file by using the previously created DataFrame
df.to_csv('bhaiChara.csv')

In [6]:
# it will create also csv file but without index number
df.to_csv('bhaiChara_without_Index.csv', index=False)

In [7]:
# it will print first 2 rows
df.head(2)

Unnamed: 0,name,marks,city
0,miku,49,Talcher
1,milan,90,Cuttack


In [8]:
# it will print last two rows 
df.tail(2)

Unnamed: 0,name,marks,city
1,milan,90,Cuttack
2,karan,89,Angul


In [9]:
# it will generate an mathematical stats of the numerical column
df.describe()

Unnamed: 0,marks
count,3.0
mean,76.0
std,23.388031
min,49.0
25%,69.0
50%,89.0
75%,89.5
max,90.0


In [10]:
df

Unnamed: 0,name,marks,city
0,miku,49,Talcher
1,milan,90,Cuttack
2,karan,89,Angul


In [11]:
df

Unnamed: 0,name,marks,city
0,miku,49,Talcher
1,milan,90,Cuttack
2,karan,89,Angul


In [12]:
df['marks'][0] = 78
df['marks'][1] = 80
df['marks'][2] = 50

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df['marks'][0] = 78
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['marks'][0] = 78
You are setting values 

In [13]:
df

Unnamed: 0,name,marks,city
0,miku,78,Talcher
1,milan,80,Cuttack
2,karan,50,Angul


In [14]:
df.to_csv('changed_profile.csv')

In [15]:
df

Unnamed: 0,name,marks,city
0,miku,78,Talcher
1,milan,80,Cuttack
2,karan,50,Angul


In [16]:
# here i'm changing Index names
df.index = ['ek','dui','tini']

In [17]:
df

Unnamed: 0,name,marks,city
ek,miku,78,Talcher
dui,milan,80,Cuttack
tini,karan,50,Angul


In [21]:
newDf = pd.DataFrame(np.random.rand(200,6))

In [49]:
# newDf
# newDf.head(7)
# type(newDf)
# newDf.describe()
# newDf.dtypes
# newDf[0][1] = 'miku' # here it means newDf['col_name']['rowname'], here 0, 1 are not index number they r index name :)
# newDf.index
newDf.columns

RangeIndex(start=0, stop=6, step=1)