## PANDAS

Pandas is a powerful and widely used open-source Python library primarily used for data manipulation and analysis. It provides high-performance data structures and tools for working with structured data, making it an essential tool for data scientists and analysts.

In [1]:
import pandas as pd

Pandas has two main objects: Series and DataFrame.

### Series
The Series object is a ONE-dimensional array that can store various types of data, such as integers, floats, strings, and others. It does not have column names as it consists of only one column. Each element in a Series has a label called an index.

In [2]:
data = [0.25, 0.50, 0.75, 1]

In [3]:
print(data)

[0.25, 0.5, 0.75, 1]


Convert it into a Series

In [4]:
data = pd.Series(data)

In [5]:
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

#### QUIZ 1

Create an example of a Series.

In [6]:
name = ['Ageng', 'Ayu', 'Dinda', 'Putri']

In [7]:
print(name)

['Ageng', 'Ayu', 'Dinda', 'Putri']


In [8]:
name = pd.Series(name)

In [9]:
name

0    Ageng
1      Ayu
2    Dinda
3    Putri
dtype: object

Convert the Series to an array

In [10]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

Display the indices.

The indices are represented as a range, where the start point is inclusive, and the stop point is exclusive in the range.

In [11]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [12]:
list(range(1,10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Accessing the data

In [13]:
data[2]

0.75

- The output is 0.75 because the index "2" in Python starts from 0 by default, so it's not subtracted by 1.
- Implicit indices are the default indices of the data.
- We can define our own indices, which are called explicit indices. These are the indices defined by the user.
- When defining indices, the number of indices must be the same as the number of data points

In [14]:
data = pd.Series([0.25, 0.50, 0.75, 1 ],  index = ['a','b','c','d'])

In [15]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [16]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [17]:
data.index

Index(['a', 'b', 'c', 'd'], dtype='object')

Call the data

In [35]:
#Explicit index

data['a']

0.25

This is data selection.

Even though we have created explicit indices, we can still call their implicit indices.

In [36]:
#Implicit index
data[3]

  data[3]


1.0

#### QUIZ 2
- Create a Series.
- Create explicit (custom) indices.
- Call explicit index.
- Call implicit index (default).

In [20]:
name = pd.Series(['Ageng', 'Ayu', 'Dinda', 'Putri'], index = ['1a', '2b', '3c', '4d'])

In [21]:
name

1a    Ageng
2b      Ayu
3c    Dinda
4d    Putri
dtype: object

In [22]:
name.values

array(['Ageng', 'Ayu', 'Dinda', 'Putri'], dtype=object)

In [23]:
name.index

Index(['1a', '2b', '3c', '4d'], dtype='object')

In [24]:
name['2b']

'Ayu'

In [25]:
name[3]

  name[3]


'Putri'

When the implicit and explicit indices are the same, when we call the data, it will rely only on its explicit indices.

In [26]:
data_2 = pd.Series([0.25, 0.50, 0.75, 1], index = [2,5,3,7])

In [27]:
data_2[2]

0.25

In [28]:
data_2[0]

KeyError: 0

The result of data_2[0] is an error due to the similarity between the explicit indices and the implicit indices.

#### QUIZ 3
- Create a Series.
- Create explicit (custom) indices that are the same as the default Python indices.
- Call explicit index.
- Call implicit index (default).

In [29]:
name_2 = pd.Series(['Ageng', 'Ayu', 'Dinda', 'Putri'], index = [1,3,5,7])

In [30]:
name_2

1    Ageng
3      Ayu
5    Dinda
7    Putri
dtype: object

In [31]:
name_2.values

array(['Ageng', 'Ayu', 'Dinda', 'Putri'], dtype=object)

In [32]:
name_2.index

Index([1, 3, 5, 7], dtype='int64')

In [37]:
# Call explicit index

name_2[1]

'Ageng'

In [34]:
# Call implicit index (default)

name_2[0]

KeyError: 0

We'll now try to perform data slicing

In [38]:
data = pd.Series([0.25, 0.50, 0.75, 1], index = ['a', 'b', 'c', 'd'])

In [39]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [40]:
# For example, we will call from data 'b' to data 'c'

data['b':'c'] #explicit

b    0.50
c    0.75
dtype: float64

But if we slice its implicit indices, only the start point will appear, because implicit indices are in the form of a range.

In [42]:
data[1:2] #implicit

b    0.5
dtype: float64

#### QUIZ 4
- Create a Series.
- Create explicit (custom) indices.
- Call explicit index.
- Call implicit index (default).

In [44]:
name_3 = pd.Series(['Ageng', 'Ayu', 'Dinda', 'Putri'], index = ['1ab', '2bc', '3cd', '4de'])

In [45]:
name_3

1ab    Ageng
2bc      Ayu
3cd    Dinda
4de    Putri
dtype: object

In [46]:
name_3['2bc':'4de'] #explicit

2bc      Ayu
3cd    Dinda
4de    Putri
dtype: object

In [47]:
name_3[0:2] #implicit

1ab    Ageng
2bc      Ayu
dtype: object

### Ioc and iloc

Example of data where some implicit and explicit indices are the same.

In [48]:
data_2 = pd.Series([0.25, 0.50, 0.75, 1], index = [2,5,3,7])

In [49]:
data_2

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

When we access a single index, it will display its explicit index.

In [50]:
data_2[2] #explicit : selecting

0.25

In [51]:
data_2[2:3] #implicit : slicing

3    0.75
dtype: float64

When explicit and implicit indices are the same, inconsistencies occur as in the case above.

To address this inconsistency, we will use the principles of loc and iloc.

loc is used to call its explicit indices, while iloc is used to call its implicit indices.

In [52]:
#loc

data_2.loc[3] #selecting explicit

0.75

In [53]:
data_2.loc[2:3] #slicing explicit

2    0.25
5    0.50
3    0.75
dtype: float64

In [54]:
#iloc

data_2.iloc[0] #selecting implicit

0.25

In [56]:
#iloc

data_2.iloc[0:2] #slicing implicit

2    0.25
5    0.50
dtype: float64

#### QUIZ 5
- Create a Series.
- Create explicit indices.
- Call loc.
- Call iloc.

In [57]:
name_3 = pd.Series(['Ageng', 'Ayu', 'Dinda', 'Putri'], index = [1,3,5,7])

In [58]:
name_3 #explicit : selecting

1    Ageng
3      Ayu
5    Dinda
7    Putri
dtype: object

In [59]:
name_3[3:5] #implicit : slicing

7    Putri
dtype: object

In [60]:
name_3.loc[5] #(loc) explicit : selecting

'Dinda'

In [61]:
name_3.loc[1:3] #(loc) explicit : slicing

1    Ageng
3      Ayu
dtype: object

In [62]:
name_3.iloc[0] #(iloc) implicit : selecting

'Ageng'

In [63]:
name_3.iloc[1:3] #(iloc) implicit : slicing

3      Ayu
5    Dinda
dtype: object

#### Dictionary -- Series

In [65]:
dict_population = {'Jakarta':750,
                'Bogor': 490,
                'Depok': 350,
                'Tanggerang': 270,
                'Bekasi': 670}
#This is just an example, not the actual population numbers.

In [66]:
dict_population

{'Jakarta': 750, 'Bogor': 490, 'Depok': 350, 'Tanggerang': 270, 'Bekasi': 670}

In [67]:
#Transforming a dictionary into a series.

population = pd.Series(dict_population)

In [68]:
population

Jakarta       750
Bogor         490
Depok         350
Tanggerang    270
Bekasi        670
dtype: int64

In [69]:
population.loc['Depok']

350

In [71]:
population.iloc[2]

350

In [72]:
dict_area = {'Jakarta':737,
            'Bogor':325,
            'Depok':247,
            'Tanggerang':302,
            'Bekasi':355}

#This is just an example, not the actual land area numbers.

In [73]:
area = pd.Series(dict_area)

In [74]:
area

Jakarta       737
Bogor         325
Depok         247
Tanggerang    302
Bekasi        355
dtype: int64

#### QUIZ 6
- Create a Series from a dictionary.
- Call loc.
- Call iloc.

In [84]:
personal_information = {'Name': 'Ageng',
                        'Age': 24,
                        'Occupation':'Lecturer',
                        'City': 'Yogyakarta',
                        'Address': 'Jl. Merdeka No.1'}

In [85]:
personal_information

{'Name': 'Ageng',
 'Age': 24,
 'Occupation': 'Lecturer',
 'City': 'Yogyakarta',
 'Address': 'Jl. Merdeka No.1'}

In [86]:
information = pd.Series(personal_information)

In [87]:
information

Name                     Ageng
Age                         24
Occupation            Lecturer
City                Yogyakarta
Address       Jl. Merdeka No.1
dtype: object

In [88]:
information.loc['Name']

'Ageng'

In [89]:
information.iloc[1]

24

In [90]:
additional_information = {'Name': 'Anna',
                          'Age': 25,
                          'Occupation': 'Designer',
                          'City': 'Jakarta',
                          'Address': 'Jl. Sultan Agung No. 2'}

In [91]:
additional_information = pd.Series(additional_information)

In [92]:
additional_information

Name                            Anna
Age                               25
Occupation                  Designer
City                         Jakarta
Address       Jl. Sultan Agung No. 2
dtype: object

In [93]:
additional_information.loc['City']

'Jakarta'

In [94]:
additional_information.iloc[0]

'Anna'