# Basics

Some of pandas basic functionalities

- [Series](#series)
- [Data Frames](#data-frames)
- [Importing/Exporting Data](#importing-data--exporting-data)
  - [Importing Data](#importing-data)
  - [Exporting Data](#exporting-data)
- [Describing Data](#describing-data)
- [Operating with Values](#operating-with-values)


In [9]:
import pandas as pd

# Series

Series are the first of the two main types of data in pandas. They are a one-dimensional labelled array capable of holding various data types like integers, strings, floating-point numbers, and more. They are very much like a single column from a spreadsheet.

The main argument that should be passed to Series is a list. This list will be used as the data for each row in the Series


In [10]:
# Creating series
series = pd.Series(['BMW', 'Toyota', 'Honda'])
series

0       BMW
1    Toyota
2     Honda
dtype: object

In [11]:
colours = pd.Series(['Red', 'Blue', 'White'])
colours

0      Red
1     Blue
2    White
dtype: object

# Data Frames

Unlike Series, Data Frames are two-dimensional, size-mutable, labelled data structure with columns that can hold different data types, Essentially, it's like a spreadsheet with rows and columns.

Unlike Series, Data Frames takes a dictionary as an argument. The keys in the dictionary will be used as the names of the columns, and the values of those keys as the rows. Given that most Data Frames that will be created will need more than one single row, it's a great idea to combine Series with Data Frames, to create fully fledged tables more seamlessly.


In [12]:
car_data = pd.DataFrame({})

# Importing Data / Exporting Data

## Importing Data

Although there might be cases where creating a Data Frame from scratch might be desirable, it usually isn't. With this in mind, pandas also comes equipped with some functions that help import data from a variety of places.

Some of those places include:

- CSV (Comma Separated Values)
- Excel files
- JSON files
- DataBases\* (some other tools are required to do this)
- text files
- etc.

For right now, let's stick to using CSV's

Obs: Some functions refer to rows and columns as axis 0 or axis 1 respectively.

## Exporting Data

To export Data found in a Pandas object, simply use the method corresponding to the type of file you want the object to be converted to. If the data should be exported to a `.csv` file, use the `.to_csv('{path of exported file}')` method, much like importing it, the name of the file type is also in the function's name.

Obs.: When exporting a DataFrame, use the `index=` parameter, present in most of these functions, and set it to false, so that the index present at the beginning of a Data Frame doesn't also get exported with the data.


In [13]:
# Importing data
car_sales = pd.read_csv('../datasets/car-sales.csv')

# Exporting Data
car_sales.to_csv('../datasets//exported_car_sales.csv')

# Showing Data
car_sales

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
5,Toyota,Green,99213,4,"$4,500.00"
6,Honda,Blue,45698,4,"$7,500.00"
7,Honda,Blue,54738,4,"$7,000.00"
8,Toyota,White,60000,4,"$6,250.00"
9,Nissan,White,31600,4,"$9,700.00"


# Describing data

There are many ways to describe data, by the columns name's, by their values, their indexes, their types. Here are some of these attributes of Data Frames.

- Types of the DF's columns, use the `.dtypes` attribute;
- Names of the DF's columns, use the `.columns` attribute;
- Values of the DF, use the `.values` attribute;
- Index range of the DF, use the `.index` attribute;
- To find the types of the DF's columns, use the `.dtypes` attribute;
- To describe the values statistically use the `.describe()` method;
- To describe the data in a more complete manner, use the `.info()` method;

These functions and methods are very important to initialize the project's data. Running methods such as `.describe()` and `.info` is a great way of finding small errors early on in the project.


In [14]:
# Data Frames types
car_sales.dtypes

Make             object
Colour           object
Odometer (KM)     int64
Doors             int64
Price            object
dtype: object

In [15]:
# Returns the range of the DF
car_sales.index

RangeIndex(start=0, stop=10, step=1)

In [16]:
# Returns all of the DF's values
car_sales.values

array([['Toyota', 'White', 150043, 4, '$4,000.00'],
       ['Honda', 'Red', 87899, 4, '$5,000.00'],
       ['Toyota', 'Blue', 32549, 3, '$7,000.00'],
       ['BMW', 'Black', 11179, 5, '$22,000.00'],
       ['Nissan', 'White', 213095, 4, '$3,500.00'],
       ['Toyota', 'Green', 99213, 4, '$4,500.00'],
       ['Honda', 'Blue', 45698, 4, '$7,500.00'],
       ['Honda', 'Blue', 54738, 4, '$7,000.00'],
       ['Toyota', 'White', 60000, 4, '$6,250.00'],
       ['Nissan', 'White', 31600, 4, '$9,700.00']], dtype=object)

In [17]:
# Returns the DF's columns names
car_sales.columns

Index(['Make', 'Colour', 'Odometer (KM)', 'Doors', 'Price'], dtype='object')

In [18]:
# Describing the numerical data
car_sales.describe()

Unnamed: 0,Odometer (KM),Doors
count,10.0,10.0
mean,78601.4,4.0
std,61983.471735,0.471405
min,11179.0,3.0
25%,35836.25,4.0
50%,57369.0,4.0
75%,96384.5,4.0
max,213095.0,5.0


In [19]:
# More complete information on the DF's types
car_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Make           10 non-null     object
 1   Colour         10 non-null     object
 2   Odometer (KM)  10 non-null     int64 
 3   Doors          10 non-null     int64 
 4   Price          10 non-null     object
dtypes: int64(2), object(3)
memory usage: 532.0+ bytes


# Operating with Data

There are also methods that serve as operators, and help with visualization of individual columns data.

- `.mean()` returns the mean of numerical columns's values;
- `.sum()` return the sum of columns's values


In [20]:
# Calculating DF mean
car_prices = pd.Series([300, 1500, 111250])
car_prices.mean()

37683.333333333336

In [21]:
# Calculating sum of DF entries
car_sales.sum()

Make             ToyotaHondaToyotaBMWNissanToyotaHondaHondaToyo...
Colour               WhiteRedBlueBlackWhiteGreenBlueBlueWhiteWhite
Odometer (KM)                                               786014
Doors                                                           40
Price            $4,000.00$5,000.00$7,000.00$22,000.00$3,500.00...
dtype: object