## Getting Started with Pandas


**Important Message**

- 🔰 Notes Link - [Python Learning From Zero to Hero](https://gnnandan.notion.site/Python-Learning-b3aa7ed958fd44659aa1be95f0f225a6)
- 🔰 Open Source Developers Community Link (OSDC) - [Curious Developers Community](https://linktr.ee/curiousdevelopers.community) 
- 🔰 Official Website - [Curious Community Ecosystem](https://curiousdevelopers.in)
- 🔰 Getting Started WIth Pandas - [Pandas Official Docs](https://pandas.pydata.org/pandas-docs/version/1.4/getting_started/index.html)
- 🔰Pandas Cheatsheet - [CheatSheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
- 🔰 **Pandas all Operations** - [Pandas Best Tutorials](https://www.youtube.com/watch?v=vmEHCJofslg)

### Important Builtin Methods

```import pandas as pd ```

1. **DataFrame Creation**
   1. pd.DataFrame(data): Create a DataFrame from various data structures like arrays, lists, dictionaries, or NumPy arrays.
   2. pd.read_csv(file_path): Read a CSV file and create a DataFrame.
   <br></br>
2. **Data Exploration and Manipulation**
   1. df.head(n): Get the first n rows of the DataFrame.
   2. df.tail(n): Get the last n rows of the DataFrame.
   3. df.info(): Display information about the DataFrame, including data types and column names.
   4. df.describe(): Generate descriptive statistics of the DataFrame.
   5. df.shape: Get the dimensions (rows, columns) of the DataFrame.
   6. df.columns: Get the column labels of the DataFrame.
   7. df.dtypes: Get the data types of each column.
   8. df.isnull(): Check for missing values in the DataFrame.
   9. df.dropna(): Remove rows or columns with missing values.
   10. df.groupby(by): Group the DataFrame by one or more columns.
   11. df.sort_values(by): Sort the DataFrame based on one or more columns.
   12. df.rename(columns): Rename columns of the DataFrame.
   13. df.merge(df2): Merge two DataFrames based on common columns.
<br></br>
3. **Data Selection and Filtering**
   1. df.index = Used to get all the index of the data frame
   2. df[column_name]: Select a specific column or multiple columns.
   3. df.loc[row_indexer, column_indexer]: Access specific rows and columns using labels.
   4. df.iloc[row_indexer, column_indexer]: Access specific rows and columns using integer-based indexing.
   5. df.query(expression): Filter rows based on a Boolean expression.


In [4]:
# importing a file using pandas
import pandas as pd

# printing all the builtin methods of pandas
dir(pd)

# checking pandas version

print("My Pandas Version:",pd.__version__)

My Pandas Version: 1.4.4


### Creating Python pandas Dataframe with list

**Syntax**
1. imporing pandas library as pandasRefVar
2. data = [listData]
3. pandasRefVar.DataFrame(data)

In [16]:
import pandas as pd

data = [["Name","Nandan"],["CDC","Curious Developers Community"],["CT","Curious Technologies"]]

df = pd.DataFrame(data)
print(df)

# adding column names to a dataframe
dfColumnName = pd.DataFrame(data,columns=[["Personal Info","Domain Name"]])
print(dfColumnName)

      0                             1
0  Name                        Nandan
1   CDC  Curious Developers Community
2    CT          Curious Technologies
  Personal Info                   Domain Name
0          Name                        Nandan
1           CDC  Curious Developers Community
2            CT          Curious Technologies


### Creating Python pandas Dataframe with Dictionary

**Syntax**
1. imporing pandas library as pandasRefVar
2. data = {KeyData,ValueData}
3. pandasRefVar.DataFrame(data)

In [18]:
import pandas as pd

data = {
        'name': ['John', 'Mike', 'Suresh', 'Tracy'],
        'Age': [25, 32, 30, 26],
        'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
        'Salary':[1000000, 1200000, 900000, 1100000]
        }

df = pd.DataFrame(data)
print(df)

     name  Age Profession   Salary
0    John   25  Developer  1000000
1    Mike   32    Analyst  1200000
2  Suresh   30      Admin   900000
3   Tracy   26         HR  1100000


### creating Python pandas DataFrame from dict of lists

**Syntax**
1. imporing pandas library as pandasRefVar
2. listData = [ListData]
3. dictData = {KeyData,ListData}
4. pandasRefVar.DataFrame(data)

In [27]:
import pandas as pd

# list data
names = ['John', 'Mike', 'Suresh', 'Tracy']
ages =  [25, 32, 30, 26]
Professions = ['Developer', 'Analyst', 'Admin', 'HR']
Salaries = [1000000, 1200000, 900000, 1100000]

# dictionary data
table = {'name': names, 'Age': ages, 'Profession': Professions, 'Salary': Salaries}

df = pd.DataFrame(table)
print(df)

print()
# printing the column names
print(df.columns)

print()
# checking size of dataframe
print(df.shape) #gives the number of rows and columns

print()
# printing all the values
print(df.values)

     name  Age Profession   Salary
0    John   25  Developer  1000000
1    Mike   32    Analyst  1200000
2  Suresh   30      Admin   900000
3   Tracy   26         HR  1100000

Index(['name', 'Age', 'Profession', 'Salary'], dtype='object')

(4, 4)

[['John' 25 'Developer' 1000000]
 ['Mike' 32 'Analyst' 1200000]
 ['Suresh' 30 'Admin' 900000]
 ['Tracy' 26 'HR' 1100000]]


### Working with Real Data

In [58]:
# importing a file using pandas
import pandas as pd

# storing the filepath in variable
csv_path = 'C:/Users/NANDANGN/Desktop/Python Programming/Python_Programming/TopSellingAlbums.csv'

# reading a file (dataframe)
df = pd.read_csv(csv_path)

# printing head()
df.head()

# understanding the size of the dataframe
print(df.shape)

# accessing the required column
x = df[['Artist','Length']]
type(x)
print(x)

print()

# iloc - accessing the individual values based on rows and columns
firstArtist_way1 = df.iloc[0,0]
print(firstArtist_way1)

# loc - Access the column's value using the coulmn name
print(df.loc[0,'Artist'])

# loc[] - slicing the dataframe

slicedData = df.loc[0:2,'Artist':'Released']
slicedData

# changing the index
new_index=['a','b','c','d','e','f','g','h']

reFormattedIndex = df
reFormattedIndex.index = new_index

print(df.index)

print()

# printing the datatypes of each column
print(df.dtypes)

(8, 10)
            Artist   Length
0  Michael Jackson  0:42:19
1            AC/DC  0:42:11
2       Pink Floyd  0:42:49
3  Whitney Houston  0:57:44
4        Meat Loaf  0:46:33
5           Eagles  0:43:08
6         Bee Gees  1:15:54
7    Fleetwood Mac  0:40:01

Michael Jackson
Michael Jackson
Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], dtype='object')

Artist                               object
Album                                object
Released                              int64
Length                               object
Genre                                object
Music Recording Sales (millions)    float64
Claimed Sales (millions)              int64
Released.1                           object
Soundtrack                           object
Rating                              float64
dtype: object


In [37]:
df

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B",46.0,65,30-Nov-82,,10.0
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1,50,25-Jul-80,,9.5
2,Pink Floyd,The Dark Side of the Moon,1973,0:42:49,progressive rock,24.2,45,01-Mar-73,,9.0
3,Whitney Houston,The Bodyguard,1992,0:57:44,"R&B, soul, pop",27.4,44,17-Nov-92,Y,8.5
4,Meat Loaf,Bat Out of Hell,1977,0:46:33,"hard rock, progressive rock",20.6,43,21-Oct-77,,8.0
5,Eagles,Their Greatest Hits (1971-1975),1976,0:43:08,"rock, soft rock, folk rock",32.2,42,17-Feb-76,,7.5
6,Bee Gees,Saturday Night Fever,1977,1:15:54,disco,20.6,40,15-Nov-77,Y,7.0
7,Fleetwood Mac,Rumours,1977,0:40:01,soft rock,27.9,40,04-Feb-77,,6.5
