# Pandas Library

Pandas is well-suited for working with tabular data, such as spreadsheets or SQL tables.

## Data Structures in Pandas Library  
Pandas generally provide two data structures for manipulating data. They are:
* Series
* DataFrame

In [1]:
# import libraries

import pandas as pd
import numpy as np

## Pandas Series

A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). The axis labels are collectively called indexes.

In [2]:
# creating simple array 

prima = [2,3,5,7,11,13,17,19]

print(prima)

[2, 3, 5, 7, 11, 13, 17, 19]


In [3]:
# Creating a Series from the simple array

prima = pd.Series(prima) # that 'Series' must use a capital 'S'

prima

0     2
1     3
2     5
3     7
4    11
5    13
6    17
7    19
dtype: int64

In [4]:
# convert from Series to array

prima.values

array([ 2,  3,  5,  7, 11, 13, 17, 19], dtype=int64)

In [5]:
# display the index

prima.index

RangeIndex(start=0, stop=8, step=1)

In [6]:
# calling an element based on its index

prima[3]

7

Create a Series with a random explicit string index.

In [7]:
prima_2 = pd.Series([2, 3, 5, 7, 11, 13, 17, 19], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print(prima_2)

a     2
b     3
c     5
d     7
e    11
f    13
g    17
h    19
dtype: int64


Pandas Series index attribute  
is used to get or set the index labels of the given Series object.

In [8]:
prima_2.index

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], dtype='object')

Indexing makes use of explicit index.

In [9]:
prima_2['g']

17

On the other hand, in Pandas, the term 'implicit index' refers to the default index that Pandas assigns to each element in a Series or DataFrame when no explicit index is provided.

In [10]:
# retrieve data using the implicit index

prima[4]

11

If there's a match between the implicit and explicit indices, it relies solely on the explicit index when called

Let's say

In [11]:
data = [2,3,5,7,9,11,13,17,19]

In [12]:
data = pd.Series(data)

data

0     2
1     3
2     5
3     7
4     9
5    11
6    13
7    17
8    19
dtype: int64

In [13]:
# with index custom

data = pd.Series([2,3,5,7,9,11,13,17,19], index = [1, 2, 3, 4, 5, 6, 7, 8, 9])

data

1     2
2     3
3     5
4     7
5     9
6    11
7    13
8    17
9    19
dtype: int64

In [14]:
data[1] # Thus, the explicitly indexed one gets called

2

In [15]:
data[0] # If we try to call based on the implicit index, an error will occur.

KeyError: 0

Slicing with implicit and explicit index

In [17]:
data_2 = pd.Series([0.2, 0.3, 0.5, 0.7], index = ['a', 'b', 'c', 'd'])

data_2

a    0.2
b    0.3
c    0.5
d    0.7
dtype: float64

In [18]:
# explicit index

data_2['a':'c']

a    0.2
b    0.3
c    0.5
dtype: float64

In [19]:
# implicit index

data_2[0:1]

a    0.2
dtype: float64

### loc and iloc

Creating a sample dataset with matching implicit and explicit indices.

In [20]:
data_3 = pd.Series(['Dipa', 'Salsa', 'Irpan', 'Aisyah'], index = [2, 3, 4, 5])

data_3

2      Dipa
3     Salsa
4     Irpan
5    Aisyah
dtype: object

To clarify data retrieval, we can use the loc and iloc attributes.

### loc
The loc attribute allows indexing and slicing that always references the explicit index.

In [21]:
data_3.loc[3]

'Salsa'

In [22]:
# slicing with loc attribute

data_3.loc[1:4]

2     Dipa
3    Salsa
4    Irpan
dtype: object

### iloc
The iloc attribute allows indexing and slicing that always refer to the implicit integer index

In [23]:
data_3.iloc[3]

'Aisyah'

In [24]:
# slicing with iloc attribute

data_3.iloc[1:4]

3     Salsa
4     Irpan
5    Aisyah
dtype: object

# Data Frame

which was introduced in Python via the Pandas library, is a tabular data structure allowing users to handle and store data in a two-dimensional table structure consisting of rows and columns. It can be interpreted as a set of one or more series, with at least one series included

Creating a series from a dictionary

In [3]:
# creating simple dictionary

dict_year = {'Salsabilla':2002,
              'Rahma':2003,
              'Sagita':2001,
              'Sheno':2004}
dict_year

{'Salsabilla': 2002, 'Rahma': 2003, 'Sagita': 2001, 'Sheno': 2004}

In [4]:
# creating series for the dictionary

dict_year = pd.Series(dict_year)

dict_year

Salsabilla    2002
Rahma         2003
Sagita        2001
Sheno         2004
dtype: int64

In [5]:
# creating simple dictionary

dict_age = {'Salsabilla': 22,
           'Rahma' : 21,
           'Sagita' : 23,
           'Sheno' : 20}
dict_age

{'Salsabilla': 22, 'Rahma': 21, 'Sagita': 23, 'Sheno': 20}

In [6]:
# creating series for the dictionary

dict_age = pd.Series(dict_age)

dict_age

Salsabilla    22
Rahma         21
Sagita        23
Sheno         20
dtype: int64

In [8]:
Student = pd.DataFrame({'Year':dict_year,'Age':dict_age}) #Pay attention when writing the DataFrame. 'D' and 'F' must be capitalized

Student

Unnamed: 0,Year,Age
Salsabilla,2002,22
Rahma,2003,21
Sagita,2001,23
Sheno,2004,20


In [9]:
# can also call just one column.

Student['Age']

Salsabilla    22
Rahma         21
Sagita        23
Sheno         20
Name: Age, dtype: int64

In [10]:
# can also call one column and value to see more specifically.

Student['Year']['Salsabilla']

2002

Can also call with dot, however...

When calling data with the dot Age syntax it will appear as below:

In [11]:
Student.Age

Salsabilla    22
Rahma         21
Sagita        23
Sheno         20
Name: Age, dtype: int64

This can arise because just one word, but if there are 2 words with spaces, cannot be called in this way

So it is safer to call data with the example syntax: name_df[name_columns]

In [12]:
Student['Age']

Salsabilla    22
Rahma         21
Sagita        23
Sheno         20
Name: Age, dtype: int64

We can also change the Data Frame column names if there is a need to change them, by:

In [14]:
Student = pd.DataFrame({'Birth Year':dict_year,'Age':dict_age})

Student

Unnamed: 0,Birth Year,Age
Salsabilla,2002,22
Rahma,2003,21
Sagita,2001,23
Sheno,2004,20


In [15]:
Student['Birth Year']

Salsabilla    2002
Rahma         2003
Sagita        2001
Sheno         2004
Name: Birth Year, dtype: int64

In [20]:
Student['Birth Year']['Rahma':'Sheno'] # explicit Index

Rahma     2003
Sagita    2001
Sheno     2004
Name: Birth Year, dtype: int64

In [21]:
Student['Age'].iloc[0:3] # implicit index using iloc

Salsabilla    22
Rahma         21
Sagita        23
Name: Age, dtype: int64

# Load CSV Dataset

To load a CSV dataset in Python, you typically use the pandas library, which provides a simple and powerful data manipulation and analysis toolset. But dont forget to make sure place the dataset in same folder with the notebook.

In [22]:
# basically, this is the syntax to load a dataset, but don't forget to import and run pandas first

# df = pd.read_csv('name_dataset.csv')

In [23]:
# example we have dataset csv titanic, we want to load that datase

df = pd.read_csv('Titanic.csv')

In [24]:
# look at the data above

df.head()

# by default, if the parameters are not filled in, it will display the top 5 data, but you can also fill in the parameters freely according to your needs.

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [25]:
# look data info

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [26]:
# viewing the number of non-null values in the data.

df.notnull().sum()

PassengerId    891
Survived       891
Pclass         891
Name           891
Sex            891
Age            714
SibSp          891
Parch          891
Ticket         891
Fare           891
Cabin          204
Embarked       889
dtype: int64

In [27]:
# viewing the number of null values in the data.

df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [28]:
# viewing from the bottom of the data.

df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [29]:
# viewing the number of rows and columns.

df.shape

(891, 12)

In [30]:
# viewing columns.

df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [31]:
# viewing the index.

df.index

RangeIndex(start=0, stop=891, step=1)

In [32]:
# displaying information from numerical columns.

df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [33]:
# for example, displaying the average of the Age column.

df['Age'].mean()

29.69911764705882

In [34]:
# for example, displaying the median of the Age column.

df['Age'].median()

28.0

In [35]:
# for example, displaying the mode of the Age column.

df['Age'].mode()[0]

# why do we have to include [0] because there might be more than 1 mode right, well, we use [0] so that only 1 mode comes out.

24.0

In [36]:
# for example, displaying the minimum value of the Age column.

df['Age'].min()

0.42

In [37]:
# for example, displaying the maximum value of the Age column.

df['Age'].max()

80.0

In [38]:
# for example, viewing the mean of the Age column.

df.Age.mean() 

# it should use a dot because it's only one word. If there's a space, there should be brackets, but it's safer to use brackets even if it's just one word.

29.69911764705882

In [39]:
# for example, checking for NaN in the Age column

df[df['Age'].isnull()]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0000,,S
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C
26,27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.2250,,C
28,29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q
...,...,...,...,...,...,...,...,...,...,...,...,...
859,860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C
863,864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.5500,,S
868,869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5000,,S
878,879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S


In [40]:
# then masking it to observe NaN in the Age column

[df['Age'].isnull()]

# it displays the entire Age column data but only as true or false

[0      False
 1      False
 2      False
 3      False
 4      False
        ...  
 886    False
 887    False
 888     True
 889    False
 890    False
 Name: Age, Length: 891, dtype: bool]

In [41]:
# for example, observing unique data from the Sex column.

df['Sex'].unique()

array(['male', 'female'], dtype=object)

In [42]:
# for example, observing unique data from the Pclass column.

df['Pclass'].unique()

array([3, 1, 2], dtype=int64)

In [43]:
# for example, checking the count of unique entries in the Sex column.

df.Sex.nunique()

2

Let's load another csv Dataset

In [44]:
df = pd.read_csv('kpopidolsv3.csv')

df.head()

Unnamed: 0,Stage Name,Full Name,Korean Name,K Stage Name,Date of Birth,Group,Debut,Company,Country,Second Country,Height,Weight,Birthplace,Other Group,Former Group,Gender
0,2Soul,Kim Younghoon,김영훈,이솔,10/09/1997,7 O'clock,26/08/2014,Jungle,South Korea,,172.0,55.0,,,,M
1,A.M,Seong Hyunwoo,성현우,에이엠,31/12/1996,Limitless,9/07/2019,ONO,South Korea,,181.0,62.0,,,,M
2,Ace,Jang Wooyoung,장우영,에이스,28/08/1992,VAV,31/10/2015,A team,South Korea,,177.0,63.0,,,,M
3,Aeji,Kwon Aeji,권애지,애지,25/10/1999,Hash Tag,11/10/2017,LUK,South Korea,,163.0,,Daegu,,,F
4,AhIn,Lee Ahin,이아인,아인,27/09/1999,MOMOLAND,9/11/2016,Double Kick,South Korea,,160.0,44.0,Wonju,,,F


In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1778 entries, 0 to 1777
Data columns (total 16 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Stage Name      1778 non-null   object 
 1   Full Name       1769 non-null   object 
 2   Korean Name     1768 non-null   object 
 3   K Stage Name    1777 non-null   object 
 4   Date of Birth   1776 non-null   object 
 5   Group           1632 non-null   object 
 6   Debut           1632 non-null   object 
 7   Company         1632 non-null   object 
 8   Country         1778 non-null   object 
 9   Second Country  62 non-null     object 
 10  Height          836 non-null    float64
 11  Weight          566 non-null    float64
 12  Birthplace      834 non-null    object 
 13  Other Group     140 non-null    object 
 14  Former Group    264 non-null    object 
 15  Gender          1778 non-null   object 
dtypes: float64(2), object(14)
memory usage: 222.4+ KB


In [46]:
df.Gender.value_counts()

Gender
M    889
F    889
Name: count, dtype: int64

### Unique & Nunique

"unique" and "ununique" in the context of Python refer more to the presence of repeating or non-repeating elements in a data structure, and how to identify and manipulate those elements.

Unique (non-repetitive) : to get the unique elements of a data structure like list, we can use set(). Set is a data structure that contains only unique elements

In [47]:
df['Gender'].unique()

array(['M', 'F'], dtype=object)

Nunique (Repetitive or Duplicate) : To get repeated or duplicate elements from a data structure, we can use various approaches, for example by using certain functions or modules, or by manually looping through the data structure.

In [48]:
df['Gender'].nunique()

2

In [49]:
# how to call a specific column

df[['Stage Name', 'Group', 'Company']]

Unnamed: 0,Stage Name,Group,Company
0,2Soul,7 O'clock,Jungle
1,A.M,Limitless,ONO
2,Ace,VAV,A team
3,Aeji,Hash Tag,LUK
4,AhIn,MOMOLAND,Double Kick
...,...,...,...
1773,ZN,LABOUM,NH
1774,Zoa,Weeekly,Play M
1775,Zuho,SF9,FNC
1776,Z-UK,,


In [50]:
# defined 

sel_columns = df[['Stage Name', 'Group', 'Company']]
sel_columns

Unnamed: 0,Stage Name,Group,Company
0,2Soul,7 O'clock,Jungle
1,A.M,Limitless,ONO
2,Ace,VAV,A team
3,Aeji,Hash Tag,LUK
4,AhIn,MOMOLAND,Double Kick
...,...,...,...
1773,ZN,LABOUM,NH
1774,Zoa,Weeekly,Play M
1775,Zuho,SF9,FNC
1776,Z-UK,,


In [51]:
df [df['Group'] == 'EXO']

Unnamed: 0,Stage Name,Full Name,Korean Name,K Stage Name,Date of Birth,Group,Debut,Company,Country,Second Country,Height,Weight,Birthplace,Other Group,Former Group,Gender
55,Baekhyun,Byun Baekhyun,변백현,백현,6/05/1992,EXO,8/04/2012,SM,South Korea,,174.0,58.0,Wonmi,EXO-CBX| SuperM,,M
142,Chanyeol,Park Chanyeol,박찬열,찬열,27/11/1992,EXO,8/04/2012,SM,South Korea,,185.0,70.0,Seoul,,,M
145,Chen,Kim Jongdae,김종대,첸,21/09/1992,EXO,8/04/2012,SM,South Korea,,173.0,64.0,Silheung,EXO-CBX,,M
175,D.O.,Do Kyungsoo,도경수,디오,12/01/1993,EXO,8/04/2012,SM,South Korea,,173.0,60.0,Gyeonggi,,,M
843,Kai,Kim Jongin,김종인,카이,14/01/1994,EXO,8/04/2012,SM,South Korea,,182.0,65.0,Suncheon,SuperM,,M
925,Lay,Zhang Yixing,장이씽,레이,7/10/1991,EXO,8/04/2012,SM,China,,177.0,60.0,Changsa,,,M
1225,Sehun,Oh Sehun,오세훈,세훈,12/04/1994,EXO,8/04/2012,SM,South Korea,,181.0,63.0,Seoul,,,M
1394,Suho,Kim Junmyeon,김준면,수호,22/05/1991,EXO,8/04/2012,SM,South Korea,,173.0,65.0,Seoul,,,M
1580,Xiumin,Kim Minseok,김민석,시우민,26/03/1990,EXO,8/04/2012,SM,South Korea,,173.0,65.0,Guri,EXO-CBX,,M


In [53]:
# for example calling the Height column more than 185 cm

df[df['Height'] > 185]

Unnamed: 0,Stage Name,Full Name,Korean Name,K Stage Name,Date of Birth,Group,Debut,Company,Country,Second Country,Height,Weight,Birthplace,Other Group,Former Group,Gender
51,Baek Seung,Kim Hyunwoo,김현우,백승,5/10/2004,EPEX,8/06/2021,C9,South Korea,,186.0,64.0,Seoul,,,M
578,Hyunsuk,Yoon Hyunsuk,윤현석,현석,8/09/2001,CIX,23/07/2019,C9,South Korea,,188.0,,,,,M
695,Jihun,Kim Jihun,김지훈,지훈,20/02/1995,KNK,3/03/2016,YNB,South Korea,,186.0,73.0,,,,M
792,Jukang,Lee Hyowon,이효원,주강,22/11/2002,JWiiver,17/02/2022,JTG,South Korea,,186.0,,Daegu,,,M
896,Kris,Wu Yifan,오역범,크리스,6/11/1990,,,,Canada,China,187.0,73.0,Guangzhou,,EXO,M
915,Kyungmin,Jo Kyungmin,조경민,경민,28/10/2004,8TURN,30/01/2023,MNH,South Korea,,187.0,60.0,Gyeonggi,,,M
964,Lou,Kim Hosung,김호성,로우,21/12/1996,VAV,31/10/2015,A team,South Korea,,187.0,69.0,,,,M
1235,Seoham,Park Seoham,박서함,서함,28/10/1993,KNK,3/03/2016,YNB,South Korea,,190.0,70.0,,,,M
1415,Sunghyun,Kim Sunghyun,김성현,성현,16/03/1996,IN2IT,27/01/2016,Star Empire,South Korea,,186.0,71.0,,,,M
1479,Takuya,Terada Takuya,테라다 타쿠야,타쿠야,18/03/1992,CROSS GENE,8/07/2012,Amuse,Japan,,187.0,68.0,Moriya,,,M


In [54]:
# retrieving rows with values in the Weight column that are greater than 45 cm using negation.

df[~(df['Weight'] > 45)]

Unnamed: 0,Stage Name,Full Name,Korean Name,K Stage Name,Date of Birth,Group,Debut,Company,Country,Second Country,Height,Weight,Birthplace,Other Group,Former Group,Gender
3,Aeji,Kwon Aeji,권애지,애지,25/10/1999,Hash Tag,11/10/2017,LUK,South Korea,,163.0,,Daegu,,,F
4,AhIn,Lee Ahin,이아인,아인,27/09/1999,MOMOLAND,9/11/2016,Double Kick,South Korea,,160.0,44.0,Wonju,,,F
5,Ahra,Go Ahra,고아라,아라,21/02/2001,Favorite,5/07/2017,Astory,South Korea,,,,Yeosu,,,F
6,Ahyeon,Jung Ahyeon,정아현,아현,11/04/2007,BABYMONSTER,0/01/1900,YG,South Korea,,,,,,,F
7,Ahyoon,Choi Subin,최수빈,아윤,23/10/2004,BOTOPASS,26/08/2020,WKS ENE,South Korea,,,,,,,F
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1768,Zero,Nasukawa Shota,나스카와 쇼타,제로,20/01/2003,T1419,21/09/2007,CJ E&M,Japan,,,,,,,M
1771,Zin,Jin Hyunbin,진현빈,지인,31/08/2001,bugAboo,25/10/2021,A team,South Korea,,,,,,,F
1774,Zoa,Cho Hyewon,조혜원,조아,31/05/2005,Weeekly,30/07/2020,Play M,South Korea,,170.0,,,,,F
1775,Zuho,Bae Juho,백주호,주호,4/07/1996,SF9,5/10/2016,FNC,South Korea,,,,,,,M


In [55]:
# calls 2 colums and their values

df[(df['Height'] > 150) & (df['Weight'] <= 40)]

Unnamed: 0,Stage Name,Full Name,Korean Name,K Stage Name,Date of Birth,Group,Debut,Company,Country,Second Country,Height,Weight,Birthplace,Other Group,Former Group,Gender
239,Dohee,Kwon Dohee,권도희,도희,1/08/2002,Cignature,4/02/2020,C9,South Korea,,158.0,39.0,Seoul,,,F
240,Dohee,Min Dohee,민도희,도희,25/09/1994,Tiny-G,23/08/2012,GNG,South Korea,,152.0,39.0,,,,F
306,Eunchae,Son Eunchae,손은채,은채,6/10/1999,bugAboo,25/10/2021,A team,South Korea,,154.0,38.0,Pohang,,,F
1395,Suhye,Kim Suhye,김수혜,수혜,13/12/2004,LIMELIGHT,17/02/2023,143,South Korea,,158.0,40.0,Incheon,,,F
1714,Yubin,Cho Yubin,조유빈,유빈,9/10/1999,,,,South Korea,,156.0,40.0,,,Pink Fantasy,F


In [56]:
# call 2 columns and their values ( | = or)

df[(df['Height'] > 150) | (df['Weight'] <= 40)]

Unnamed: 0,Stage Name,Full Name,Korean Name,K Stage Name,Date of Birth,Group,Debut,Company,Country,Second Country,Height,Weight,Birthplace,Other Group,Former Group,Gender
0,2Soul,Kim Younghoon,김영훈,이솔,10/09/1997,7 O'clock,26/08/2014,Jungle,South Korea,,172.0,55.0,,,,M
1,A.M,Seong Hyunwoo,성현우,에이엠,31/12/1996,Limitless,9/07/2019,ONO,South Korea,,181.0,62.0,,,,M
2,Ace,Jang Wooyoung,장우영,에이스,28/08/1992,VAV,31/10/2015,A team,South Korea,,177.0,63.0,,,,M
3,Aeji,Kwon Aeji,권애지,애지,25/10/1999,Hash Tag,11/10/2017,LUK,South Korea,,163.0,,Daegu,,,F
4,AhIn,Lee Ahin,이아인,아인,27/09/1999,MOMOLAND,9/11/2016,Double Kick,South Korea,,160.0,44.0,Wonju,,,F
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1770,Zico,Woo Jiho,우지호,지코,14/09/1992,Block B,15/04/2011,KQ,South Korea,,182.0,65.0,Seoul,,,M
1772,Ziu,Park Heejun,박희준,지우,16/06/1997,VAV,31/10/2015,A team,South Korea,,185.0,70.0,,,,M
1773,ZN,Bae Jinye,배진예,지엔,9/06/1994,LABOUM,27/08/2014,NH,South Korea,,169.0,48.0,Bucheon,UNI.T,,F
1774,Zoa,Cho Hyewon,조혜원,조아,31/05/2005,Weeekly,30/07/2020,Play M,South Korea,,170.0,,,,,F


In [57]:
# calls the stage name, group, and height columns when the height is greater than 185 and the gender is male.

df [['Stage Name', 'Group', 'Height']] [(df['Height'] > 185) | (df['Gender'] == 'male')]

Unnamed: 0,Stage Name,Group,Height
51,Baek Seung,EPEX,186.0
578,Hyunsuk,CIX,188.0
695,Jihun,KNK,186.0
792,Jukang,JWiiver,186.0
896,Kris,,187.0
915,Kyungmin,8TURN,187.0
964,Lou,VAV,187.0
1235,Seoham,KNK,190.0
1415,Sunghyun,IN2IT,186.0
1479,Takuya,CROSS GENE,187.0
