**Coursebook: Python Beginner**
- Part 1 of Python Beginner DBS Indonesia Training
- Course Length: 6 hours
- Last Updated: February 2022

___

- Developed by [Algoritma](https://algorit.ma)'s product division and instructors team

# Background

## Top-Down Approach 

The coursebook is part of the **Data Analytics Specialization** offered by [Algoritma](https://algorit.ma). It takes a more accessible approach compared to Algoritma's core educational products, by getting participants to overcome the "how" barrier first, rather than a detailed breakdown of the "why". 

This translates to an overall easier learning curve, one where the reader is prompted to write short snippets of code in frequent intervals, before being offered an explanation on the underlying theoretical frameworks. Instead of mastering the syntactic design of the Python programming language, then moving into data structures, and then the `pandas` library, and then the mathematical details in an imputation algorithm, and its code implementation; we would do the opposite: Implement the imputation, then a succinct explanation of why it works and applicational considerations (what to look out for, what are assumptions it made, when _not_ to use it etc).

For the most part, experience in Python programming is good to have but not required. Familiarity with data manipulation and data structures in a different programming language a welcome addition but again, not required.

## Training Objectives

This coursebook is intended for participants new to the world of data analysis and / or programming. No prior programming knowledge is assumed. 

The coursebook focuses on:
- Introduction to Python
- Introduction to Python Notebook
- Introduction to the `pandas` library. 
- Introduction to `DataFrame`  
- Data Types
- Exploratory Data Analysis
- Indexing and Subsetting

The final part of this course is a Graded Asssignment, where you are expected to apply all that you've learned on a new dataset, and attempt the given questions.

## Python Basic Programming
### Setting up Environment

**How to create a new virtual environment:**

1. Open Command Prompt

2. Create new virtual environment with:
    ```
    python -m venv <ENV_NAME>
    ```
    For example: `python -m venv dbs_python`


3. Navigate to the folder that has been created from the command prompt by run this command:
    ```
    cd <ENV_NAME>\Scripts
    ```
    For example: `cd dbs_python\Scripts`
    
4. Activate the environment by run this command: `activate`
   
5. Install package that you need using this command:
    ```
    pip install --proxy http://bankid:yourpassword@10.1.162.1:8080 <package>
    ```
    
6. Install Jupyter Notebook, by using the same command for package installation above.
 
    ```
    pip install --proxy http://bankid:yourpassword@10.1.162.1:8080 notebook
    ```
    


# Python for Data Analysts

Since you'll spend a great deal of your time working with data in Python Notebook, I think it's important to get yourself familiar with notebok documents (or "notebooks").

> Notebook documents are documents  which contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc…). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis.

Python Notebook provides an easy-to-use, interactive data science environment that doesn't only work as an IDE, but also as a presentation tool.


## Working with Jupyter Notebook

### Markdown Cell and Code Cell

Tipe cell dalam notebook:
1. Markdown 
2. Code

**Contoh Markdown Cell**

Ini adalah cell markdown. Kita bisa menulis teks **bold**, *italic*, bahkan formula matematis seperti:

\begin{equation}
f(x) = \frac{e^{-x}}{(1+e^{-x})}
\end{equation}

**Contoh Code Cell**

In [1]:
print('Hello 2023!')

Hello 2023!


## Command Mode and Edit Mode

There are 2 cell modes in notebooks:
1. Command mode (cell berwarna BIRU)
     - `a` : add cell above
     - `b` : add cell below
     - `d` + `d` : delete selected cells
     - `x` : cut selected cells
     - `c` : copy selected cells
     - `v` : paste selected cells
     - `z` : undo
     - `m` : change cell type to markdown
     - `y` : change cell type to code
     - `enter` : enter Edit Mode
     - `h` : show keyboard shortcuts


2. Edit mode (cell berwarna HIJAU)
     - `Ctrl + Enter`: execute one cell
     - `Esc`: changes edit mode to command mode
    
Shortcut set: **CTRL + SHIFT + P**

Pada edit mode bisa menambahkan tulisan apapun

In [2]:
2+3

5

## Variables and Keywords

- **Variable** adalah sebuah nama yang dipakai untuk menunjukkan sebuah nilai. 

- Tanda `=` dipakai untuk membuat variable baru. Proses ini sering disebut sebagai **assignment**.

In [3]:
# membuat variable panjang dan lebar
panjang = 4 
lebar = 5

In [4]:
luas = panjang * lebar

In [5]:
luas

NameError: name 'luas' is not defined

Thing to note here, like other programming languages, Python is **case-sensitive**, so `activity` and `Activity` are  different symbols and will point to different variables.

In [None]:
# Luas
# menhasilkan error karena python case sensitive

We can use `#` to create command in `code` mode

**Syarat dan ketentuan dalam memberikan nama variable pada Python:**
- Menggunakan kombinasi dari huruf kapital (A-Z), huruf nomina (a-z), angka (0-9).
- Special character `!, $ , &, dll` tidak dapat digunakan dalam penamaan variabel.
- **Tidak boleh** menggunakan angka di awal.
- Bersifat **case-sensitive** sehingga penamaan variable `algoritma`, `ALGORITMA`, dan `Algoritma` adalah 3 variable yang berbeda
- **Tidak boleh** menggunakan **keyword** pada Python

In [None]:
# Python Keyword List
import keyword
keyword.kwlist

In [None]:
# untuk komentar lebih dari 1 baris: ctrl+/
# True = 100
# False = 200
# from = 300

## Types of variables

### String
In Python, `String` data type represents text. It is comprised of a set of characters that can also contain spaces and numbers. For example, the word "hamburger" and the phrase "I ate 3 hamburgers" are both strings. There are several ways to create string object in Python:
- using single quote `''`
- using double-quote `""`
- using triple quote `'''` or `"""` 

In [None]:
# contoh menggunakan petik 1
kalimat1 = 'Hello World!'

In [None]:
# contoh menggunakan petik 2
kalimat2 = "I'm a programmer"

In [None]:
# contoh menggunakan petik 3
kalimat3 = '''"I'm smart!", he said'''

In [None]:
kalimat1

In [None]:
# cek tpe data
type(kalimat1)

### Int and Float
`Int` (integer) is used to store numbers without a decimal point, otherwise `float` is a number with decimal point. 
Example:
- `an_int = 1`
- `a_float = 1.1`

In [None]:
nilai1 = 5
nilai2 = 5.5

In [None]:
print(type(nilai1))
print(type(nilai2))

**Operasi Angka** \
Operator Aritmatika:
- `+` - Penambahan
- `-` - Pengurangan
- `*` - Perkalian
- `/` - Pembagian
- `//` - Floor Division (pembagian dengan pembulatan ke bawah) 
- `%` - Modulus (hasil lagi)
- `**` - Eksponen (pangkat)

Operator Perbandingan:
- `<` - Lebih kecil dari (yaitu : a < b)
- `<=` - Lebih kecil atau sama dengan (yaitu : a <= b)
- `>` - Lebih besar dari (yaitu: a > b)
- `>=` - Lebih besar atau sama dengan (yaitu: a >= b)
- `==` - Sama dengan (yaitu: a == b)
- `!=` - Tidak Sama dengan (yaitu: a != b)

In [None]:
3 ** 2

In [None]:
4 > 3

### Boolean

Boolean stores only 2 values i.e. `True` or `False`

In [None]:
a = True 
b = False 

In [None]:
type(a)

**Boolean Operation** \
Logical operators:
- `and` (example: `a and b` -> returns `True` if **a and b** are True)
- `or` (example: `a or b` -> returns `True` if at least **one** of a and b is True)
- `not` (example: `not a` -> negation of a, if `a = True`, then `not a` is `False`)

In [None]:
# False and True
2<0 or 10>9

### List
`list` is used to store some values in python.

How to declare the `list` variable: \
`variable_name = [value1, value2, value3]`

In [None]:
# Create a python list
nilai = [10, 20, 30, 40, 50, 'lima']

In [None]:
type(nilai)

In [None]:
nilai.append(60)

In [None]:
nilai

**List Operation**
- `x.append(a)` : add a to the list with the variable name x
- `x.remove(a)` : removes a from the list with the variable name x

Aggregation function:
- `len(x)` : extract the length of the list
- `a in b` : checks if the value of `a` is in the list object `b`
- `max(x)` : gets the highest value in x
- `sum(x)` : get the sum of the values in x

Another operation to be aware of on lists is indexing:
- `x[i]` : accesses the i-th element of x

In [None]:
nilai

In [None]:
nilai[2]

In [None]:
nilai.append([70,80])

In [None]:
nilai

In [None]:
nilai.extend([70,80])

In [None]:
nilai

Note : `append` hanya bisa menerima 1 nilai

Other python list method : https://www.w3schools.com/python/python_lists_methods.asp

# Introduction to pandas Library

## Working with DataFrame

We will start off by learning about a powerful Python data analysis library by the name of `pandas`. Its official documentation introduces itself as the "fundamental high-level building block for doing practical, real world data analysis in Python", and strive to do so by implementing many of the key data manipulation functionalities in R. This makes `pandas` a core member of many Python-based scientific computing environments.

From its [official documentation](https://pandas.pydata.org):

> Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.

To use `pandas`, we will use Python's `import` function. Once imported, all `pandas` function can be accessed using the *pandas.function_name* notation.

In [None]:
import pandas as pd

In [None]:
pd.__version__

> Note: All methods in `pandas` can be accessed with syntax like: `pandas.function_name()`

### Read Data

To read data or files with `.csv` format, you can use the `.read_csv()` method.

Read `data.csv` which is in the `data_input` folder

```
Syntax: pandas.read_csv(path/data)
```

In [None]:
# read data
pd.read_csv("data_input/data.csv", index_col=0)

In the code above, we used `.read_csv()` to read a csv file from a specified path. Notice that we set `index_col=0` so the first column in the csv is used as the index. By default, this function treats the first row as the header row. We can add `header=None` to the function call telling `pandas` to read in a CSV without headers.

You may find it curious that we use `0` to reference the first element of an axis; This is because Python uses 0-based indexing, a behavior that is different from other languages such as R and Matlab.

In [None]:
# read data without using index_col
loan = pd.read_csv("data_input/data.csv")

In [None]:
type(loan)

In [None]:
# mengatur tidak ada batasan jumlah kolom yg ditampilkan
pd.set_option('display.max_columns', None)

In [None]:
loan.head()

#### Knowledge Check: Error

Referring to the previous **Python keywords** concept. Which of the following 4 lines of code will evaluate without raising an error?

- [ ] `pd.read_csv("data.csv", index_col=false)` -> error karena path, dan seharusnya `False`
- [ ] `Import pandas as pd` -> case sensitive, seharusnya `import`
- [X] `print(100-2)`
- [ ] `None = 2` -> None adalah keyword 

In [None]:
## Your code below
print(100-2)
## -- Solution code

In [None]:
# melihat stored variable
%whos

## Data Types

`pandas` allow data analysts to create Series objects and DataFrame objects. Series is used to represent a one-dimensional array whereas DataFrame emulates the functionality of "Data Frames" in R and is useful for tabular data. 

In practice, a large proportion of our data is tabular: when we import data from a relational database (MySQL, Postgre) or from a spreadsheet software (Google Sheets, Microsoft Excel) we can represent these data as a DataFrame object.

When we call `pd.read_csv()` earlier, `pandas` will try to infer data types from the values in each column. Sometimes, it get it right but more often that not, a data analyst's intervention is required. In the following sub-section, we'll learn about various techniques an analyst have at his/her disposal when it comes to the treatment of pandas data types.

`dtypes` simply stands for "data types". Because `loan` is a `pandas` object, accessing the `dtypes` attribute will return a series with the data type of each column. 

In [None]:
# check data types 
loan.dtypes

----
#### Knowledge check: `.dtypes` and pandas attributes
Look at the following code - what is the expected output from the following code? Why?
```
x = [2019, 4, 'data science']
x.dtypes
```

In [None]:
## Your code below
x = [2019, 4, 'data science']
#x.dtypes

## -- Solution code
type(x)

Let's take a look at some examples of `DataFrame.dtypes`:

In [None]:
employees = pd.DataFrame({
    'name': ['Anita', 'Brian'],
    'age': [34, 29],
    'joined': [pd.Timestamp('20190410'), pd.Timestamp('20171128')],
    'degree': [True, False],
    'hourlyrate': [35.5, 29],
    'division': ['HR', 'Product']
})

In [None]:
## Your code below



Let's go through the columns and their data types from the above `DataFrame`:

- `name` [`object`]: store text values
- `age` [`int`]: integer values
- `joined` [`datetime`]: date and time values
- `degree` [`bool`]: True/False values
- `hourlyrate` [`float`]: floating point values
- `division` [`object`]: store text values

Among these columns, only `age` and `hourlyrate` are columns with numeric values. This is a simple, but important, observation to make as we make our way into the Exploratory Data Analysis phase. But before we do, let's do one more exercise. Take a closer look at the Data Frame we just created again.

Out of the 6 columns, one of them is of special interest to our next discussion, **categorical values**.

### Categorical Variables


When working with categories, it is recommended both from a business point of a view and a technical one to use `pandas` categorical data type. From a business perspective, this adds clarity to the analyst's mind about the type of data he/she is working with. This informs and guides the analysis, on questions such as which statistical methods or plot types to use.

From a technical viewpoint, the memory savings -- and in turn, computation speed as well as computational resources -- can be quite significant. 

One more important remarks from the docs:

> Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood type, country affiliation or rating via Likert scales.

Can you spot which of our column holds values that should be encoded in the `category` data type? Once you've spotted it, use the `astype('category')` method to perform the conversion. Remember to re-assign this new column so the original column (`object`) type is overwritten with the new `category` type column.

Examples:

```py
# convert marital_status to category
employees['marital_status'] = employees['marital_status'].astype('category')

# convert experience to integer
employees['experience'] = employees['experience'].astype('int')
```

In [None]:
## Your code below


Use `employees.dtypes` to confirm that you've done the exercise above correctly:

In [None]:
## Your code below


In most real-world projects, your work as a data analyst will involve working with **categorical**, **numeric** and **datetime** values; either treating them as "features" or "target". In the case of machine learning:

- A **categorical** target represents a classification problem
- A **numeric** target represents a regression problem

## Exploratory Data Analysis Tools

In simple words, exploratory data analysis (EDA) refers to the process of performing initial investigations on data, often with the objective of becoming familiar with certain characteristics of the data. This is usually done with the aid of summary statistics and simple graphical techniques that purposefully uncover the structure of our data.

We'll start off by using some of the most convenient EDA tools conveniently built into `pandas`. Particularly, this is a summary of what we'll cover in common EDA workflows:

- `.head()` and `.tail()`
- `.describe()`
- `.shape` and `.size`
- `.axes`
- `.dtypes`

**Method vs. Atribut**

- Secara penulisan/sintaks:
    - Method diikuti dengan tanda kurung `()`
    - Atribut **tidak** diikuti oleh tanda kurung

- Secara kegunaan:
    - Method terdapat nilai parameter yang dapat diganti-ganti
    - Atribute **tidak** ada parameter

## `head()` and `tail()`

- `head(n)` digunakan untuk inspeksi `n` data teratas dari sebuah dataframe.
- `tail(n)` digunakan untuk inspeksi `n` data terbawah dari sebuah dataframe.

In [None]:
# code here


## `.describe()`

Method `describe()` menampilkan 8 ringkasan statistika deskriptif. Secara default menampilkan ringkasan untuk kolom numerik. 

Ringkasan statistika yang dimaksud adalah sebagai berikut:
- Count: banyaknya baris pada dataframe
- Mean: rata-rata nilai
- Standard Deviation: jarak rata-rata antara data ke mean (titik pusat data)
- Minimum Value: nilai terkecil dari keseluruhan data
- 25th Percentile (Q1)
- 50th Percentile (Q2/Median)
- 75th Percentile (Q3)
- Maximum Value: nilai terbesar dari keseluruhan data

In [None]:
# deskripsi dari data employees


Segala fungsi statistik memiliki method-nya tersendiri, misal `count()`, `mean()`, `std()`, `min()`, `max()`, `quantile()`

In [None]:
# code here


Parameter `include` ataupun `exclude` pada `describe()` untuk melihat statistika deskriptif dari variable non-numeric:

In [None]:
# code here


## `shape` dan `size`

`shape` dan `size` adalah atribut dari sebuah dataframe yang memberikan informasi terkait dimensi data dan ukuran data

In [None]:
# check shape


In [None]:
# check size 


___

## `axes`

`axes` adalah atribut dataframe yang memberikan informasi terkait index dataframe (baik index kolom maupun index baris)

In [None]:
# check axes


___

## `dtypes`

We've covered `.dtypes` in earlier sections, so go ahead and practice inspecting the data types of `loan` DataFrame. Are the columns in the right data types? If they are not, formulate a mental checklist of type conversion you need to perform.

In [None]:
# check data types of loan


Kolom dengan tipe data yang belum tepat:
- ...
- ...

## `astypes()`

In [None]:
# ubah tipe data pada data loan


___

## Knowledge Check: Data types

In [None]:
inventory = pd.DataFrame({
    'units_instock': [50, 40, 30],
    'discount_price': [15.0, 5, 7],
    'item_name': ['bawang', 'garam', 'gula'],
    'unit_sold': ['123', '456', '789']
})

Misalkan saja kita memiliki sebuah DataFrame dengan nama `inventory`.

1. Ketika menjalankan perintah `inventory.dtypes`, maka pandas akan menampilkan tipe data pada setiap kolom (series). Manakah kolom dibawah ini yang mungkin memiliki tipe data yang salah?
 - [ ] A. `units_instock`: int64 
 - [ ] B. `discount_price`: float64
 - [ ] C. `item_name`: object
 - [ ] D. `units_sold`: object
 
 
2. Kita ingin mengetahui jumlah kolom yang terdapat pada `inventory` DataFrame, manakah dibawah ini perintah yang tepat untuk menampilkan jumlah kolom `inventory`? Pilih beberapa kode program yang mungkin!
 - [ ] A. `print(len(inventory.columns))`
 - [ ] B. `print(inventory.shape[1])`
 - [ ] C. `print(len(inventory.axes[1]))`
 
 
3. Salin kode program di bawah ini, kemudian jalankan. 
```
x = [2019, 4, 'data science']
x.dtypes
```
Lihat apa yang terjadi?
Kemudian jalankan perintah: 
```
type(x)
```
Maka apa perbedaannya dengan sebelumnya?

## Indexing and Subsetting with Pandas

Using indexing operators to select, summarize or transform only a subset of data is a critical part of any data analysis workflow. Consider the following use-cases:

- Compare the sales in Year 2018 vs Year 2019  
- Identify missed opportunities in a specific market segment (e.g. Retail vs Wholesale)
- Best quarter of the year to execute cross-selling promos / discounts
- Study profitability of goods in the higher price range (e.g. IDR45000000+) and how competitors positioning affect sales in that price range

Notice that in all of these use-cases, data analysts will want to use some combination of indexing and then perform the necessary computations on that specific slice or slices of data. Unsurprisingly, `pandas` come with a number of methods to help you accomplish this task.

In the following section, we'll take a closer look at some of the most common slicing and subsetting operations in `pandas`:
- `head()` and `tail()`  
- The `[]` operator
- `.loc`  
- `.iloc`
- Conditional subsetting

## Slicing: **`[]` operator**

Digunakan untuk melakukan subsetting dengan cara mengiris *(slicing)* index **baris** pada dataframe.

Formula penulisannya adalah `[start:end]` dengan mengikuti aturan indexing pada python (dimulai dari 0) dimana `start` inclusive dan `end` exclusive.

Dengan menggunakan metode slicing, silahkan tampilkan baris ke 1 dan ke 2

In [None]:
# slice baris ke 1 dan 2
loan[0:2]

### Knowledge Check: Slicing

Dengan memperhatikan `end` exclusive pada metode indexing, tampilkan baris ke 8 sampai ke 12 pada data loan.\
Pilih jawaban yang tepat dibawah ini!

- [ ] A. `loan[7:12]`
- [ ] B. `loan[8:12]`
- [ ] C. `loan[7:13]`
- [ ] D. `loan[8:13]`

In [None]:
# code here


## `.iloc` dan `.loc`

Dengan menggunakan `.iloc` dan `loc` kita dapat melakukan pengirisan pada index **baris dan kolom**. 

Perbedaan yang mendasar dari kedua operator ini adalah:
- `.iloc` merujuk pada lokasi **index** baris atau kolomnya sehingga harus **integer**, sedangkan
- `.loc` merujuk pada **nama** baris atau kolomnya

We can also use `:` to indicate no subsetting in a certain direction. The following code slices out the first 5 rows but take all columns (pay attention to the use of the `:` operator): 

In [None]:
# contoh iloc


Misal ingin mengambil nilai spesifik, yaitu `2011-01-12` pada baris pertama dari kolom `issue_d`.

In [None]:
## Your code below

## -- Solution code

`.loc`, in contrast to `.iloc` does not subset based on _integer_ but rather subset based on `label`. 

In [None]:
# contoh loc
# subset all rows for 'id' and 'home_ownership'


We can still use `integer` but our integers will be treated or interpreted as _labels_.

Let's read in the same `csv`, except this time we will set `id` as the row labels:

In [None]:
# code here
loan_id = pd.read_csv('data_input/data.csv', index_col = 0)

To subset for the row of loan corresponding to id 1077501 and 1076863, we can use label-based indexing (`.loc`) as such:

In [None]:
# code here


### Conditional Subsetting

Along with `.iloc` and `.loc`, probably the most helpful type of subsetting would have to be conditional subsetting.

Syntax:

```
df[df['column_name'] <comparison_operator> <value>]
```

With conditional subsetting, we select data based on criteria we specified.

Get the data of all loan with RENT home ownership  

In [None]:
# code here


Get the data of loan with annual income greater than 30000 

In [None]:
# code here


We can also use the `&` and `|` operators to join conditions:

`sales[(sales.salesperson == 'Moana') & (sales.amount > 5000)]` subset any rows where Moana has sold more than $5000 worth of items

#### Knowledge Check: Multiple Conditional Subsetting

1. Get the data of loan with RENT home ownership and the annual income greater than 30000.

In [None]:
# code here


2. Get the data of loan with level of income_category is Medium and High

In [None]:
# code here


___

## Quiz

We will use `companies.csv` data for our quiz. 

Make sure to read the data to begin the quiz.

In [None]:
clients = pd.read_csv("data_input/companies.csv", index_col=0)
clients.head()

### Pre-processing

Pre-processing to remove the `IDR` part from data

In [None]:
clients[['Consulting Sales','Software Sales','Returns']] =\
clients[['Consulting Sales','Software Sales','Returns']].apply(lambda x: x.str.replace('IDR',''))
clients.head()

Pre-processing to remove the `,` (comma) from data

In [None]:
clients['Returns'] = clients['Returns'].str.replace(",","")
clients.head()

Pre-processing to change the data type into `int`

In [None]:
clients[['Consulting Sales','Software Sales','Returns']] =\
clients[['Consulting Sales','Software Sales','Returns']].astype('int')
clients.dtypes

### Case 1
As a Data Analyst, you want to analyze the total sales. To analyze it, create a new column in the DataFrame and name it `Total Sales`. This column is a sum of `Consulting Sales` and `Software Sales`. Use `head` or `tail` to peek at the resulting data frame to confirm that the output matches your expectation. 

---
What is the sum of the `Total Sales` column? Tips: Use the `.sum()` method on the columns to accumulate the total value!

      *Berapa total (`sum`) keseluruhan dari `Total Sales`? Tips: Gunakan method `.sum()` pada kolom untuk mengakumulasi nilai totalnya!*
        
    - [ ] 11,470,000
    - [ ] 19,238,903
    - [ ]  7,768,903

In [None]:
# code here


### Case 2
Based on the total sales obtained each year, you are currently focusing on analyzing sales from each client in 2017. Therefore, for now you will focus on companies that became clients in 2017. Use subsetting methods to get information on sales data that occurred in 2017.

---
Which company has the biggest `Total Sales` in 2017?

    *Perusahaan manakah yang mendapatkan Total Sales terbesar di tahun 2017?*
    - [ ] New Media Group
    - [ ] PT. Algoritma Data Indonesia
    - [ ] Palembang Konsultansi

In [None]:
# Code here


### Case 3
After that, we will return our all available data. Using all available data, it turns out that the company wants to do campaign companies that have a sales value exceeding 1,500,000 IDR. Please use the subsetting again to find out which companies have sales exceeding 1,500,000 IDR. It turns out that there are two companies whose sales value exceeds 1,500,000 IDR in the data.

---
Which are the companies have sales exceeding 1,500,000 IDR in the data?  

    *Perusahaan mana saja yang memiliki penjualan lebih dari 1,500,000 IDR?*

    - [ ] Palembang Konsultansi & PT. Surya Citra Manajemen
    - [ ] PT. Surya Citra Manajemen & New Media Group
    - [ ] Palembang Konsultansi & New Media Group