## Pandas was built on top of the NumPy library

Pandas is a Python library that is built on top of the NumPy library. It provides easy-to-use data structures and data analysis tools for handling tabular data. Pandas is particularly useful for working with large datasets, as it allows for efficient data manipulation and analysis.


In [1]:
import pandas as pd
import numpy as np

## Pandas is design to work with tabular data

Pandas is designed to work with tabular data, such as data stored in a spreadsheet or database. Pandas can load data from a variety of formats, including CSV files, Excel spreadsheets, and SQL tables. Pandas can also work with data that is not tabular, such as time series data. 

Tabular data is any data that can be represented as rows and columns.

## Pandas DataFrame

In Pandas, tabular data or rectangular data is represented as DataFrame object.

Let's import the csv file into a Pandas DataFrame object:

In [4]:
df = pd.read_csv("./datasets/data-penumpang-bus-transjakarta-november-2021.csv", delimiter=",")

#### ```.head()``` method

The ```.head()``` method displays the first five rows of the DataFrame by default.
This become very handy when you have a large dataset and you want to have a quick look at the data.

In [5]:
df.head()

Unnamed: 0,tahun,bulan,jenis,kode_trayek,trayek,jumlah_penumpang
0,2021,11,Mikrotrans,JAK.88,Terminal Tanjung Priok - Ancol Barat,40135
1,2021,11,Mikrotrans,JAK.85,Bintara - Cipinang Indah,38487
2,2021,11,Mikrotrans,JAK.84,Terminal Kampung Melayu - Kapin Raya,49142
3,2021,11,Mikrotrans,JAK.80,Rawa Buaya - Rawa Kompeni,66701
4,2021,11,Mikrotrans,JA.77,Tanjung Priok - Jembatan Item,75248


#### ```.info()``` method

The ```.info()``` method displays the information about the DataFrame, including the number of rows and columns, the total memory usage, the data type of each column, and the number of non-null values.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129 entries, 0 to 128
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   tahun             129 non-null    int64 
 1   bulan             129 non-null    int64 
 2   jenis             129 non-null    object
 3   kode_trayek       129 non-null    object
 4   trayek            129 non-null    object
 5   jumlah_penumpang  129 non-null    int64 
dtypes: int64(3), object(3)
memory usage: 6.2+ KB


### ```.describe()``` method

The ```.describe()``` method displays the summary statistics of the DataFrame, including the count, mean, standard deviation, minimum, maximum, and the percentiles.

In [12]:
df.describe()

# count is the number of non-missing values in each DataFrame column

Unnamed: 0,tahun,bulan,jumlah_penumpang
count,129.0,129.0,129.0
mean,2021.0,11.0,96267.01
std,0.0,0.0,148223.5
min,2021.0,11.0,0.0
25%,2021.0,11.0,30703.0
50%,2021.0,11.0,59685.0
75%,2021.0,11.0,87210.0
max,2021.0,11.0,1073929.0


#### ```.shape``` attribute

The ```.shape``` attribute displays the number of rows and columns of the DataFrame.

In [9]:
df.shape # tuple(rows, columns)

(129, 6)

## DataFrame Components

DataFrame consist of three different components, each accessible using different attributes:

### ```.values``` attribute

The ```.values``` attribute returns a two-dimensional NumPy array of values. 

In [13]:
df.values # numpy array

array([[2021, 11, 'Mikrotrans', 'JAK.88',
        'Terminal Tanjung Priok - Ancol Barat', 40135],
       [2021, 11, 'Mikrotrans', 'JAK.85', 'Bintara - Cipinang Indah',
        38487],
       [2021, 11, 'Mikrotrans', 'JAK.84',
        'Terminal Kampung Melayu - Kapin Raya', 49142],
       [2021, 11, 'Mikrotrans', 'JAK.80', 'Rawa Buaya - Rawa Kompeni',
        66701],
       [2021, 11, 'Mikrotrans', 'JA.77', 'Tanjung Priok - Jembatan Item',
        75248],
       [2021, 11, 'Mikrotrans', 'JAK.75', 'Cililitan - Kp. Pulo', 56825],
       [2021, 11, 'Mikrotrans', 'JAK.74',
        'Terminal Rawamangun - Cipinang Muara', 55667],
       [2021, 11, 'Mikrotrans', 'JAK.73', 'Jambore Cibubur - Pasar Rebo',
        84918],
       [2021, 11, 'Mikrotrans', 'JAK.72',
        'Kampung Rambutan - Pasar Rebo via Poncol', 100864],
       [2021, 11, 'Mikrotrans', 'JAK.71',
        'Kampung Rambutan - Pinang Ranti', 65152],
       [2021, 11, 'Mikrotrans', 'JAK.64', 'Lenteng Agung - Aseli', 61617],
       [

### ```.columns``` & ```.index``` attribute

The ```.columns``` attribute returns an index object that represents the column labels. The ```.columns``` attribute contains the column names of the DataFrame.

The ```.index``` attribute contains row numbers or row names of the DataFrame.

In [14]:
df.columns

Index(['tahun', 'bulan', 'jenis', 'kode_trayek', 'trayek', 'jumlah_penumpang'], dtype='object')

In [33]:
df.index

RangeIndex(start=0, stop=129, step=1)