# Introduction to Pandas

A useful and widely-used open-source data anaysis and manipulation library. If you're dealing with data in the Python language you'll likely start with Pandas.

Pandas uses 2 primary data structures.

1.   Series (1 Dimensional data)
2.   DataFrame (2-Dimensional data)

We'll mostly be using the DataFrame (2-dimensional) structure.





# Check for Pandas package

First, we'll check to see if Pandas is already installed.

The output tells us that pandas (as of the time of this writing) version 2.1.4 is already installed.

In [1]:
!pip show pandas

Name: pandas
Version: 2.1.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: 
Author-email: The Pandas Development Team <pandas-dev@python.org>
License: BSD 3-Clause License
        
        Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
        All rights reserved.
        
        Copyright (c) 2011-2023, Open source contributors.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice,
          this list of conditions and the following disclaimer in the documentation
          and/or other materials 

# Import pandas

To use pandas we must import it. Often we give it an alias/nickname to make referring to it easier like calling someone matt instead of matthew.

In [3]:
import pandas as pd

# Import data into Pandas

We'll import a csv file into a Pandas DataFrame. This csv file is hosted on github and can be imported by providing the url to the file.

In [6]:
car_data = pd.read_csv('https://raw.githubusercontent.com/matthewpecsok/data_engineering/main/data/carAuction.csv')

We can use the type() function to ask Python what type of Object car_data is. As expected, it's a Pandas DataFrame.

In [7]:
type(car_data)

# What is a Pandas DataFrame?

It's effectively columns and rows of data. You can think of it like a spreadsheet. In fact, you can import the same csv file into Excel to view the dataset.

# dataframe.shape

If the DataFrame is columns and rows, we might ask: How many columns and rows? We can use the shape property to do so. It returns a tuple () containing the number of rows and columns (row_count,column_count)

We have 10,000 rows and 11 columns.

In [8]:
car_data.shape

(10000, 11)

# What are the column names?

Next we might be curious what the column names are. We can use car_data.columns to do so. This gives us an index object containing all of the column names as a List. Lists are defined in Python by using [ ]. A list is editable, a tuple is not (it's immutable)

The columns are in the order as follows and are named

['Auction', 'Color', 'IsBadBuy', 'MMRCurrentAuctionAveragePrice', 'Size',
       'TopThreeAmericanName', 'VehBCost', 'VehicleAge', 'VehOdo',
       'WarrantyCost', 'WheelType']




### Strings in Python

Words and text in Python is enclosed in single or double quotes. So the column name Auction is enclosed in '' to tell us this is a String object.

In [9]:
car_data.columns

Index(['Auction', 'Color', 'IsBadBuy', 'MMRCurrentAuctionAveragePrice', 'Size',
       'TopThreeAmericanName', 'VehBCost', 'VehicleAge', 'VehOdo',
       'WarrantyCost', 'WheelType'],
      dtype='object')

# dataframe.head

To see the first few rows of the dataframe we can use the head function. We can override the default of 6 rows if we like to see more or less than the first 5 rows.

In [10]:
car_data.head()

Unnamed: 0,Auction,Color,IsBadBuy,MMRCurrentAuctionAveragePrice,Size,TopThreeAmericanName,VehBCost,VehicleAge,VehOdo,WarrantyCost,WheelType
0,ADESA,WHITE,No,2871,LARGE TRUCK,FORD,5300,8,75419,869,Alloy
1,ADESA,GOLD,Yes,1840,VAN,FORD,3600,8,82944,2322,Alloy
2,ADESA,RED,No,8931,SMALL SUV,CHRYSLER,7500,4,57338,588,Alloy
3,ADESA,GOLD,No,8320,CROSSOVER,FORD,8500,5,55909,1169,Alloy
4,ADESA,GREY,No,11520,LARGE TRUCK,FORD,10100,5,86702,853,Alloy


In [11]:
car_data.head(n=3)

Unnamed: 0,Auction,Color,IsBadBuy,MMRCurrentAuctionAveragePrice,Size,TopThreeAmericanName,VehBCost,VehicleAge,VehOdo,WarrantyCost,WheelType
0,ADESA,WHITE,No,2871,LARGE TRUCK,FORD,5300,8,75419,869,Alloy
1,ADESA,GOLD,Yes,1840,VAN,FORD,3600,8,82944,2322,Alloy
2,ADESA,RED,No,8931,SMALL SUV,CHRYSLER,7500,4,57338,588,Alloy
