# **Introduction To Pandas**
* Welcome to pandas, a powerful Python library for data manipulation and analysis!
* Pandas provides data structures and functions to efficiently handle structured data, including tabular data such as spreadsheets and SQL tables.
* With pandas, you can easily import, manipulate, and analyze data, making it a essential tool for data scientists and analysts.
* Pandas is built on top of the NumPy library and provides high-performance, easy-to-use data structures and operations.
* Whether you're working with small datasets or large-scale data analytics, pandas is the perfect tool to help you get started.

In [1]:
# importing the pandas and numpy libraries
import numpy as np
import pandas as pd

In [2]:
dict1={
    "names":['pavan','karthik','sanjay'],
    "marks":[80,81,78],
    "city":['bangalore','hyderabad','mumbai']
}

## **DataFrame**
* The **pd.DataFrame()** function in pandas is used to create a new DataFrame from a dictionary, list, or other data structure.
* It takes data and index/column labels as input and returns a DataFrame object.

In [3]:
df=pd.DataFrame(dict1) #Creates a dataframe

In [4]:
df

Unnamed: 0,names,marks,city
0,pavan,80,bangalore
1,karthik,81,hyderabad
2,sanjay,78,mumbai


## **to_csv()**
* The **to_csv()** method in pandas is used to write a DataFrame to a comma-separated values (CSV) file.

In [5]:
df.to_csv('friends.csv') # saves dataframe to a csv file. We have to specify filename within parenthesis in quotes.

In [6]:
df.to_csv('friendsdata.csv',index=False)

## **head()** 
* The **head()** method in pandas is used to display the first few rows of a DataFrame.
* **Parameters- n**: the number of rows to display (default is 5).

In [7]:
df.head() # this method returns starting tuples of a dataframe. 

Unnamed: 0,names,marks,city
0,pavan,80,bangalore
1,karthik,81,hyderabad
2,sanjay,78,mumbai


In [8]:
df.head(2) # returns the starting two rows of a dataframe

Unnamed: 0,names,marks,city
0,pavan,80,bangalore
1,karthik,81,hyderabad


## **tail()** 
* The **tail()** method in pandas is used to display the last few rows of a DataFrame.
* **Parameters- n**: the number of rows to display (default is 5).

In [9]:
df.tail() #this method returns last tuples of a dataframe. 

Unnamed: 0,names,marks,city
0,pavan,80,bangalore
1,karthik,81,hyderabad
2,sanjay,78,mumbai


In [10]:
df.tail(2) #returns the ending two rows of dataframe

Unnamed: 0,names,marks,city
1,karthik,81,hyderabad
2,sanjay,78,mumbai


## **describe()**
* The **describe()** method in pandas generates descriptive statistics for a DataFrame, including count, mean, standard deviation, min, max, and quartiles. * It provides a concise summary of the central tendency and variability of the data. 
* This method is useful for understanding the distribution and characteristics of the data.

In [11]:
# df.describe() method gives statistical information of numeric columns
df.describe()

Unnamed: 0,marks
count,3.0
mean,79.666667
std,1.527525
min,78.0
25%,79.0
50%,80.0
75%,80.5
max,81.0


## **read_csv()**
* The **read_csv()** method in pandas is used to read a comma-separated values (CSV) file into a DataFrame.
* It takes the file path or buffer as input and returns a DataFrame object.

In [12]:
trains=pd.read_csv('trains.csv') # reads a csv file and converts it to a dataframe

In [13]:
trains

Unnamed: 0,Train no,Speed,City
0,12333,25,Kacheguda
1,12456,125,Jabalpur
2,17652,99,Chennai
3,12708,80,Mysore


In [14]:
trains['Train no'] 

0    12333
1    12456
2    17652
3    12708
Name: Train no, dtype: int64

In [15]:
trains

Unnamed: 0,Train no,Speed,City
0,12333,25,Kacheguda
1,12456,125,Jabalpur
2,17652,99,Chennai
3,12708,80,Mysore


In [16]:
trains['Speed ']

0     25
1    125
2     99
3     80
Name: Speed , dtype: int64

In [17]:
trains['Speed '][0]=65

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  trains['Speed '][0]=65


In [18]:
trains

Unnamed: 0,Train no,Speed,City
0,12333,65,Kacheguda
1,12456,125,Jabalpur
2,17652,99,Chennai
3,12708,80,Mysore


# **Series and DataFrames**
### ***Series:***
* **One-dimensional labeled array:** A Series is a one-dimensional array of values with a label for each value, called the index.
* **Similar to a column in a spreadsheet:** A Series can be thought of as a single column in a spreadsheet, with the index serving as the row labels.

### ***DataFrames:***
* **Two-dimensional labeled data structure:** A DataFrame is a two-dimensional table of values with rows and columns, similar to an Excel spreadsheet or a table in a relational database.
* **Collection of Series:** A DataFrame can be thought of as a collection of Series, where each Series represents a column in the DataFrame.


In [19]:
ser=pd.Series(np.random.rand(10))

In [20]:
ser

0    0.055497
1    0.740812
2    0.045880
3    0.162554
4    0.048984
5    0.613659
6    0.888503
7    0.287144
8    0.576479
9    0.517802
dtype: float64

In [21]:
type(ser)

pandas.core.series.Series

In [22]:
newdf=pd.DataFrame(np.random.rand(335,7),index=np.arange(335))

In [23]:
newdf.head(5)

Unnamed: 0,0,1,2,3,4,5,6
0,0.846513,0.013579,0.659622,0.065013,0.550563,0.341911,0.797594
1,0.074951,0.76141,0.319567,0.694019,0.730952,0.051327,0.165704
2,0.759315,0.364142,0.471356,0.006423,0.422011,0.697226,0.860157
3,0.103562,0.069411,0.091427,0.668558,0.78192,0.577655,0.26931
4,0.121605,0.43444,0.134807,0.147778,0.874878,0.159727,0.727686


In [24]:
type(newdf)

pandas.core.frame.DataFrame

In [25]:
newdf.describe()

Unnamed: 0,0,1,2,3,4,5,6
count,335.0,335.0,335.0,335.0,335.0,335.0,335.0
mean,0.491538,0.514969,0.487347,0.535592,0.533533,0.495274,0.499644
std,0.28839,0.278341,0.282535,0.27607,0.287543,0.284315,0.302688
min,0.001976,0.004235,0.000729,0.001909,0.003608,0.003891,0.001613
25%,0.229982,0.294487,0.237029,0.306348,0.291101,0.2559,0.222509
50%,0.487663,0.527727,0.480614,0.563077,0.555256,0.470557,0.498866
75%,0.763139,0.747122,0.722217,0.765861,0.777969,0.738853,0.774254
max,0.999319,0.99895,0.998815,0.996594,0.998273,0.990981,0.999887


## **to_numpy()**
* The **to_numpy()** method in pandas is used to convert a DataFrame to a NumPy array.
* It returns a NumPy array representation of the DataFrame's values.

In [26]:
newdf.to_numpy() #creates a numpy array from a dataframe

array([[0.84651286, 0.01357864, 0.65962195, ..., 0.5505628 , 0.34191054,
        0.79759378],
       [0.07495071, 0.76141015, 0.31956651, ..., 0.73095227, 0.0513266 ,
        0.16570445],
       [0.75931543, 0.36414235, 0.4713564 , ..., 0.42201135, 0.69722645,
        0.86015743],
       ...,
       [0.2215315 , 0.20551908, 0.05677297, ..., 0.5958663 , 0.69774761,
        0.36327749],
       [0.2490728 , 0.98778732, 0.38031226, ..., 0.69229079, 0.50387139,
        0.7485847 ],
       [0.68562821, 0.3528717 , 0.46193795, ..., 0.85973313, 0.75453082,
        0.38774006]], shape=(335, 7))

In [27]:
newdf.head(5)

Unnamed: 0,0,1,2,3,4,5,6
0,0.846513,0.013579,0.659622,0.065013,0.550563,0.341911,0.797594
1,0.074951,0.76141,0.319567,0.694019,0.730952,0.051327,0.165704
2,0.759315,0.364142,0.471356,0.006423,0.422011,0.697226,0.860157
3,0.103562,0.069411,0.091427,0.668558,0.78192,0.577655,0.26931
4,0.121605,0.43444,0.134807,0.147778,0.874878,0.159727,0.727686


## **shape**
* The **shape** attribute in pandas returns a tuple representing the dimensionality of a DataFrame, with the first element being the number of rows and the second element being the number of columns.

In [28]:
newdf.shape #shape returns a tuple specifying number of rows and columns. In below output dataset has 335 rows and 7 columns

(335, 7)

## **T**
* The **T** attribute in pandas is used to transpose a DataFrame, swapping its rows with columns.

In [29]:
newdf.T # transpose of dataframe 

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,325,326,327,328,329,330,331,332,333,334
0,0.846513,0.074951,0.759315,0.103562,0.121605,0.210655,0.623769,0.765265,0.108733,0.559263,...,0.263787,0.113626,0.564724,0.504291,0.86535,0.933665,0.159701,0.221531,0.249073,0.685628
1,0.013579,0.76141,0.364142,0.069411,0.43444,0.137453,0.91646,0.950852,0.658165,0.701777,...,0.013772,0.016106,0.045588,0.948791,0.691163,0.481855,0.210105,0.205519,0.987787,0.352872
2,0.659622,0.319567,0.471356,0.091427,0.134807,0.328967,0.445924,0.676496,0.511414,0.710298,...,0.293416,0.798024,0.306096,0.786709,0.480614,0.178135,0.651405,0.056773,0.380312,0.461938
3,0.065013,0.694019,0.006423,0.668558,0.147778,0.929371,0.904637,0.76648,0.003827,0.575017,...,0.43599,0.621947,0.151997,0.552849,0.916567,0.068841,0.365752,0.74345,0.838213,0.163125
4,0.550563,0.730952,0.422011,0.78192,0.874878,0.245874,0.947836,0.589891,0.415068,0.361993,...,0.940867,0.946277,0.119146,0.063122,0.695139,0.635606,0.2225,0.595866,0.692291,0.859733
5,0.341911,0.051327,0.697226,0.577655,0.159727,0.732286,0.360253,0.092577,0.1965,0.256142,...,0.2653,0.975974,0.701881,0.087602,0.132727,0.382393,0.121769,0.697748,0.503871,0.754531
6,0.797594,0.165704,0.860157,0.26931,0.727686,0.54024,0.920396,0.693787,0.584695,0.101624,...,0.872741,0.977862,0.430889,0.024837,0.892816,0.095076,0.107463,0.363277,0.748585,0.38774


## **sort_index()**
* The **sort_index()** method in pandas is used to sort a DataFrame by its index in ascending or descending order. It returns a new sorted DataFrame, leaving the original DataFrame unchanged.

In [30]:
newdf.sort_index(axis=1,ascending=False)

Unnamed: 0,6,5,4,3,2,1,0
0,0.797594,0.341911,0.550563,0.065013,0.659622,0.013579,0.846513
1,0.165704,0.051327,0.730952,0.694019,0.319567,0.761410,0.074951
2,0.860157,0.697226,0.422011,0.006423,0.471356,0.364142,0.759315
3,0.269310,0.577655,0.781920,0.668558,0.091427,0.069411,0.103562
4,0.727686,0.159727,0.874878,0.147778,0.134807,0.434440,0.121605
...,...,...,...,...,...,...,...
330,0.095076,0.382393,0.635606,0.068841,0.178135,0.481855,0.933665
331,0.107463,0.121769,0.222500,0.365752,0.651405,0.210105,0.159701
332,0.363277,0.697748,0.595866,0.743450,0.056773,0.205519,0.221531
333,0.748585,0.503871,0.692291,0.838213,0.380312,0.987787,0.249073


## **copy()**
* The **copy()** method in pandas is used to create a deep copy of a DataFrame, meaning it creates a new, independent copy of the original DataFrame, rather than just a reference to it.


In [31]:
newdf2=newdf.copy() #makes a copy of dataframe 

In [32]:
newdf2.head()

Unnamed: 0,0,1,2,3,4,5,6
0,0.846513,0.013579,0.659622,0.065013,0.550563,0.341911,0.797594
1,0.074951,0.76141,0.319567,0.694019,0.730952,0.051327,0.165704
2,0.759315,0.364142,0.471356,0.006423,0.422011,0.697226,0.860157
3,0.103562,0.069411,0.091427,0.668558,0.78192,0.577655,0.26931
4,0.121605,0.43444,0.134807,0.147778,0.874878,0.159727,0.727686


 ## **loc[] accessor**
 * The **loc[]** accessor in pandas is used to access a group of rows and columns by label(s) or a boolean array.
 * Syntax : **df.loc[row_selection, column_selection]**


In [33]:
newdf2.loc[0,1]=264 #loc[] method is used to change the value in a dataframe.i value is the row and j value is column. here loc[0,1] value will be assigned to 264 

In [34]:
newdf2.head(2)

Unnamed: 0,0,1,2,3,4,5,6
0,0.846513,264.0,0.659622,0.065013,0.550563,0.341911,0.797594
1,0.074951,0.76141,0.319567,0.694019,0.730952,0.051327,0.165704


In [35]:
newdf2.loc[0,'A']=12

In [36]:
newdf2.head(2)

Unnamed: 0,0,1,2,3,4,5,6,A
0,0.846513,264.0,0.659622,0.065013,0.550563,0.341911,0.797594,12.0
1,0.074951,0.76141,0.319567,0.694019,0.730952,0.051327,0.165704,


In [37]:
newdf2.drop('A',axis=1).head(3)

Unnamed: 0,0,1,2,3,4,5,6
0,0.846513,264.0,0.659622,0.065013,0.550563,0.341911,0.797594
1,0.074951,0.76141,0.319567,0.694019,0.730952,0.051327,0.165704
2,0.759315,0.364142,0.471356,0.006423,0.422011,0.697226,0.860157


In [38]:
newdf.drop([0],axis=1)

Unnamed: 0,1,2,3,4,5,6
0,0.013579,0.659622,0.065013,0.550563,0.341911,0.797594
1,0.761410,0.319567,0.694019,0.730952,0.051327,0.165704
2,0.364142,0.471356,0.006423,0.422011,0.697226,0.860157
3,0.069411,0.091427,0.668558,0.781920,0.577655,0.269310
4,0.434440,0.134807,0.147778,0.874878,0.159727,0.727686
...,...,...,...,...,...,...
330,0.481855,0.178135,0.068841,0.635606,0.382393,0.095076
331,0.210105,0.651405,0.365752,0.222500,0.121769,0.107463
332,0.205519,0.056773,0.743450,0.595866,0.697748,0.363277
333,0.987787,0.380312,0.838213,0.692291,0.503871,0.748585


## **info()**
* The **info()** method in pandas provides a concise summary of a DataFrame, including the index dtype, column dtypes, and memory usage.
* It also shows the number of non-null values in each column.

In [39]:
newdf.info()

<class 'pandas.core.frame.DataFrame'>
Index: 335 entries, 0 to 334
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       335 non-null    float64
 1   1       335 non-null    float64
 2   2       335 non-null    float64
 3   3       335 non-null    float64
 4   4       335 non-null    float64
 5   5       335 non-null    float64
 6   6       335 non-null    float64
dtypes: float64(7)
memory usage: 29.0 KB
