# Research On Seblak Sales At Tokopedia

*This program was created to see the consistency of seblak sales on Tokopedia, whether it could be a good business prospect or not.*

## A. Web Scraping

In order to analyze Seblak products on the Tokopedia platform, the first step that needs to be taken is to carry out a web scraping process on the website. The main focus of this web scraping is to collect data specifically related to the search for "Seblak".

The data that will be extracted from the Tokopedia search page includes several key information, including:

- **Product Name**: Information regarding the names of available Seblak products.
- **Product Price**: Price range of Seblak products offered.
- **Seller Name**: Identify the names of sellers who market Seblak products.
- **Store City**: Geographic location or hometown of the seller's shop.
- **Number Sold**: The total number of Seblak products that have been sold from each seller.
- **Product Rating**: Value or rating that reflects the quality of Seblak products.

By collecting and analyzing this data, we will have a strong basis for understanding sales trends, price developments, buyer preferences, and other factors that may influence the performance of Seblak products in the Tokopedia environment.

Importing the required libraries, defining urls, and setting up webdriver:

In [31]:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

url = 'https://tokopedia.com/search?navsource=&page=1&q=seblak&srp_component_id=02.01.00.00&srp_page_id=&srp_page_title=&st='

driver = webdriver.Firefox(executable_path=r'/Users/mohdfattahillah/Documents/PERSONAL/HACKTIV8/geckodriver')
driver.get(url)                  

Running a web scraping process to collect Seblak product data from the Tokopedia website:

- Repeat 10 times to retrieve data from 10 pages.
- Wait for the "zeus-root" element to appear and scroll down.
- Parse pages using BeautifulSoup for data extraction.
- Retrieve information such as product name, product price, seller, shop city, quantity sold, and product rating.
- If information is missing, replaced with 'None'.
- Data is collected in a data list.
- Once the loop is complete, the data is converted into a DataFrame with appropriate columns.

In [33]:
data = []
for i in range(10):

    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#zeus-root")))
    time.sleep(2)

    for g in range(23):
        driver.execute_script("window.scrollBy(0, 250)")

    soup = BeautifulSoup(driver.page_source, "html.parser")
    for item in soup.findAll('div', class_='css-974ipl'):
        nama_produk = item.find('div', class_='css-3um8ox').text
        harga_produk = item.find('div', class_='css-1ksb19c').text
            
        sold = item.findAll('span', class_='css-1duhs3e')
        if len(sold) > 0:
            jumlah_terjual = item.find('span', class_='css-1duhs3e').text
        else:
            jumlah_terjual = 'None'
        
        rate = item.findAll('span', class_='css-t70v7i')
        if len(rate) > 0:
            rating_produk = item.find('span', class_='css-t70v7i').text
        else:
            rating_produk = 'None'

        for item2 in item.findAll('div', class_='css-1rn0irl'):
            kota_toko = item2.findAll('span', class_='css-1kdc32b')[0].text
            penjual = item2.findAll('span', class_='css-1kdc32b')[1].text

            data.append((penjual, kota_toko, nama_produk, harga_produk, jumlah_terjual, rating_produk))

    time.sleep(1)
    driver.find_element(By.CSS_SELECTOR, "button[aria-label^='Laman berikutnya']").click()
    time.sleep(2)

df = pd.DataFrame(data, columns=['Penjual', 'Kota Toko', 'Nama Produk', 'Harga Produk', 'Jumlah Terjual', 'Rating Produk'])

df

Unnamed: 0,Penjual,Kota Toko,Nama Produk,Harga Produk,Jumlah Terjual,Rating Produk
0,Militan Snack,Kab. Sidoarjo,Seblak Coet Instan Seblak Coet Rafael Viral Se...,Rp13.000,Terjual 22,
1,tsr1997,Depok,seblak instan,Rp4.555,Terjual 1 rb+,4.9
2,Rasa Juara Indonesia,Bandung,SEBLAK JUARA INSTAN MASAK BASAH ASLI BANDUNG E...,Rp22.000,Terjual 750+,4.9
3,Lidigeli,Kab. Garut,MASJAY Seblak Bumbu Membara Instant - No.1 Str...,Rp15.000,Terjual 70+,5.0
4,Semesta kerupuk,Tangerang Selatan,Kerupuk mawar putih mentah 500gram / kerupuk i...,Rp12.000,Terjual 60+,5.0
...,...,...,...,...,...,...
1051,the Dhecip,Tangerang Selatan,Seblak Instan Pedas Home Made,Rp3.500,Terjual 1 rb+,4.8
1052,Baso Aci Ayang,Kab. Garut,Seblak Instan Komplit Pedas Gurih Nikmat,Rp5.999,Terjual 2 rb+,4.9
1053,BociKakang,Jakarta Selatan,SEBLAK KERING PEDAS DAUN JERUK,Rp4.500,Terjual 500+,4.6
1054,Aydaa Snack,Surakarta,SEBRING KRUPUK KERUPUK SEBLAK KERING PEDAS DAU...,Rp16.000,Terjual 40+,4.9


To secure web scraping data, the dataframe is exported to csv:

In [52]:
df.to_csv('seblak.csv')

## B. Data Preparation

### 1. Basic Data Exploration

After completing the web scraping process, we will then explore the data obtained. This data exploration aims to get a glimpse of the contents of the data we have, check whether there are missing values, check data types, etc.

Opening previously exported CSV data:

In [4]:
import pandas as pd

In [5]:
seblak = pd.read_csv('seblak.csv')

Menampilkan 10 sample data random:

In [7]:
print(seblak.isnull().sum())

Unnamed: 0        0
Penjual           0
Kota Toko         0
Nama Produk       0
Harga Produk      0
Jumlah Terjual    0
Rating Produk     0
dtype: int64


In [17]:
seblak.sample(10)

Unnamed: 0,Penjual,Kota Toko,Nama Produk,Harga Produk,Jumlah Terjual,Rating Produk
355,jajanan masabi,Jakarta Selatan,"baso aci masabi ( boci, tulang rangu, cuanki, ...",7500.0,50.0,5.0
769,Snack Zone Official,Jakarta Selatan,Kerupuk Seblak Pedas Mimi,18000.0,70.0,4.9
302,Distributor Topping Baso Aci,Bandung,Batagor Kering Bulat 50buah,21000.0,4000.0,4.9
329,Aydaa Snack,Surakarta,SEBRING KRUPUK KERUPUK SEBLAK KERING PEDAS DAU...,16000.0,40.0,4.9
132,lahawelah,Kab. Garut,GURILEM MINI isi 250gr Siomay Kering Toping Ba...,7819.0,100.0,4.9
1025,LinaRahayuPutriShop,Kab. Bandung,seblak sebul,50000.0,17.0,5.0
930,Gerai Snack Official Shop,Kab. Tangerang,Maicih Basreng,15300.0,10000.0,4.8
944,Aliya28,Tangerang Selatan,Seblak/Cuanki/Baso aci/Batagor kuah (instan) 1...,7500.0,100.0,5.0
827,the Dhecip,Tangerang Selatan,Seblak Instan Pedas Home Made,3500.0,1000.0,4.8
767,5Fingers,Tangerang Selatan,Seblak Basah Mommy/Camilan/Kerupuk/Snack/Cepat...,13000.0,11.0,5.0


From the random data sample above, it can be said that seblak sellers vary greatly. This data indicates that the market is very competitive from many seller cities. In terms of price, starting from IDR 9,216 - IDR 89,000. From the number sold, it can be seen that this seblak product is indeed going viral, and is also accompanied by a good rating.

Displays summary data:

In [18]:
seblak.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1056 entries, 0 to 1055
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Penjual         1056 non-null   object 
 1   Kota Toko       1056 non-null   object 
 2   Nama Produk     1056 non-null   object 
 3   Harga Produk    1056 non-null   float64
 4   Jumlah Terjual  1056 non-null   float64
 5   Rating Produk   1056 non-null   float64
dtypes: float64(3), object(3)
memory usage: 49.6+ KB


This data contains 1056 rows of data with a total of 6 columns. For the data type of all columns in the form of objects, it will later be adjusted according to analysis needs. Because all columns have the same non-null, it means there is no missing data in these columns.

Checking for missing values:

In [19]:
seblak.isnull()

Unnamed: 0,Penjual,Kota Toko,Nama Produk,Harga Produk,Jumlah Terjual,Rating Produk
0,False,False,False,False,False,False
1,False,False,False,False,False,False
2,False,False,False,False,False,False
3,False,False,False,False,False,False
4,False,False,False,False,False,False
...,...,...,...,...,...,...
1051,False,False,False,False,False,False
1052,False,False,False,False,False,False
1053,False,False,False,False,False,False
1054,False,False,False,False,False,False


From the results above, it can be confirmed that there are no missing values in this data.

### 2. Data Cleaning

After completing a simple exploration, we will then clean the data according to future analysis needs.

Delete the column in the first index:

In [8]:
seblak.drop(columns=seblak.columns[0], inplace=True)

Remove/replace string values that are not required by the float dtype. Then change the dtype from string to float:

In [9]:
seblak['Jumlah Terjual'] = (
    seblak['Jumlah Terjual']
    .str.replace('Terjual', '')
    .str.replace('+', '')
    .str.replace(' rb', '000')
    .str.replace('None', '0')
)

seblak['Jumlah Terjual'] = seblak['Jumlah Terjual'].astype(float)

  seblak['Jumlah Terjual']


Remove/replace string values that are not required by the float dtype. Then change the dtype from string to float.

In [10]:
seblak['Rating Produk'] = (
    seblak['Rating Produk']
    .str.replace('None', '0')
)

seblak['Rating Produk'] = seblak['Rating Produk'].astype(float)

Checking data changes in the dataframe.

In [20]:
seblak

Unnamed: 0,Penjual,Kota Toko,Nama Produk,Harga Produk,Jumlah Terjual,Rating Produk
0,Militan Snack,Kab. Sidoarjo,Seblak Coet Instan Seblak Coet Rafael Viral Se...,13000.0,22.0,0.0
1,tsr1997,Depok,seblak instan,4555.0,1000.0,4.9
2,Rasa Juara Indonesia,Bandung,SEBLAK JUARA INSTAN MASAK BASAH ASLI BANDUNG E...,22000.0,750.0,4.9
3,Lidigeli,Kab. Garut,MASJAY Seblak Bumbu Membara Instant - No.1 Str...,15000.0,70.0,5.0
4,Semesta kerupuk,Tangerang Selatan,Kerupuk mawar putih mentah 500gram / kerupuk i...,12000.0,60.0,5.0
...,...,...,...,...,...,...
1051,the Dhecip,Tangerang Selatan,Seblak Instan Pedas Home Made,3500.0,1000.0,4.8
1052,Baso Aci Ayang,Kab. Garut,Seblak Instan Komplit Pedas Gurih Nikmat,5999.0,2000.0,4.9
1053,BociKakang,Jakarta Selatan,SEBLAK KERING PEDAS DAUN JERUK,4500.0,500.0,4.6
1054,Aydaa Snack,Surakarta,SEBRING KRUPUK KERUPUK SEBLAK KERING PEDAS DAU...,16000.0,40.0,4.9


Checks dtype changes.

In [21]:
seblak.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1056 entries, 0 to 1055
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Penjual         1056 non-null   object 
 1   Kota Toko       1056 non-null   object 
 2   Nama Produk     1056 non-null   object 
 3   Harga Produk    1056 non-null   float64
 4   Jumlah Terjual  1056 non-null   float64
 5   Rating Produk   1056 non-null   float64
dtypes: float64(3), object(3)
memory usage: 49.6+ KB


Exporting the cleaned dataframe to CSV.

In [31]:
seblak.to_csv('seblakclean.csv', index = False)

## C. Business Understanding/Problem Statement

**Specific:**
- *Find out whether selling seblak products on Tokopedia is a good business prospect.*

**Measurable:**
- *Sales consistency among the many sellers of seblak products on Tokopedia.*

**Achievable:**
- *By carrying out several calculation techniques and hypotheses from the variables of quantity sold, product price, and product rating.*

**Relevant:**
- *By finding out the consistency of sales of seblak products on Tokopedia, later we can decide whether this is a good business prospect.*

**Time-Bound:**
- *This goal must be achieved within 4 days.*

***Seeing the consistency of seblak sales on Tokopedia by carrying out several calculation techniques and hypotheses within these 4 days can determine whether selling seblak in the future is a good business prospect or not.***

## D. Analysis

Reads CSV files from previously cleaned data and defines them into sc dataframes.

In [54]:
sc = pd.read_csv('seblakclean.csv')

Checking the dataframe whether it is correct.

In [55]:
sc

Unnamed: 0,Penjual,Kota Toko,Nama Produk,Harga Produk,Jumlah Terjual,Rating Produk
0,Militan Snack,Kab. Sidoarjo,Seblak Coet Instan Seblak Coet Rafael Viral Se...,13000.0,22.0,0.0
1,tsr1997,Depok,seblak instan,4555.0,1000.0,4.9
2,Rasa Juara Indonesia,Bandung,SEBLAK JUARA INSTAN MASAK BASAH ASLI BANDUNG E...,22000.0,750.0,4.9
3,Lidigeli,Kab. Garut,MASJAY Seblak Bumbu Membara Instant - No.1 Str...,15000.0,70.0,5.0
4,Semesta kerupuk,Tangerang Selatan,Kerupuk mawar putih mentah 500gram / kerupuk i...,12000.0,60.0,5.0
...,...,...,...,...,...,...
1051,the Dhecip,Tangerang Selatan,Seblak Instan Pedas Home Made,3500.0,1000.0,4.8
1052,Baso Aci Ayang,Kab. Garut,Seblak Instan Komplit Pedas Gurih Nikmat,5999.0,2000.0,4.9
1053,BociKakang,Jakarta Selatan,SEBLAK KERING PEDAS DAUN JERUK,4500.0,500.0,4.6
1054,Aydaa Snack,Surakarta,SEBRING KRUPUK KERUPUK SEBLAK KERING PEDAS DAU...,16000.0,40.0,4.9


Import the libraries needed to carry out the analysis.

In [56]:
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

### 1. Calculate the mean, median, standard deviation, skewness, and kurtosis values.

To achieve the desired goal, we must first calculate the average, median, standard deviation, skewness and kurtosis values from the data we have, so we can find out the type of distribution and outliers in our data.

In [40]:
#calculate some descriptive statistics from the three variables in the dataset.

mean = sc[["Harga Produk", "Jumlah Terjual", "Rating Produk"]].mean()
median = sc[["Harga Produk", "Jumlah Terjual", "Rating Produk"]].median()
std_dev = sc[["Harga Produk", "Jumlah Terjual", "Rating Produk"]].std()
skewness = sc[["Harga Produk", "Jumlah Terjual", "Rating Produk"]].apply(skew)
kurt = sc[["Harga Produk", "Jumlah Terjual", "Rating Produk"]].apply(kurtosis)

print("\nMean:\n", mean)
print("\nMedian:\n", median)
print("\nStandard Deviation:\n", std_dev)
print("\nSkewness:\n", skewness)
print("\nKurtosis:\n", kurt)


Mean:
 Harga Produk      18524.874053
Jumlah Terjual      531.629735
Rating Produk         4.519034
dtype: float64

Median:
 Harga Produk      12500.0
Jumlah Terjual       80.0
Rating Produk         4.9
dtype: float64

Standard Deviation:
 Harga Produk      20510.598187
Jumlah Terjual     1590.019960
Rating Produk         1.270656
dtype: float64

Skewness:
 Harga Produk      3.461668
Jumlah Terjual    5.015376
Rating Produk    -3.217709
dtype: float64

Kurtosis:
 Harga Produk      16.179916
Jumlah Terjual    25.948107
Rating Produk      8.548640
dtype: float64


From the results of the calculations above, we can conclude:

**Average value:**
- ***Product Price:*** *The average price of seblak products on Tokopedia still tends to be cheaper than other similar snacks/food products.*

- ***Number Sold:*** *The average number of each seblak product sold on Tokopedia is very high, perhaps affected by virality.*

- ***Product Rating:*** *The average rating of the seblak products sold is also very good at 4.5, which indicates that buyers of the seblak products are very happy with the seblak products.*

**Median Value:**
- ***Product Price:*** *From the results of this median calculation, we get the median value for the price of seblak products sold on Tokopedia is IDR 12,500, which means this is the middle price which can be used as a reference if we want to sell similar products later.*

- ***Number Sold:*** *From the results of this median calculation, we can better understand the average sales performance which is more consistent because it is not affected by the extreme value of 80pcs per product, which is still quite good.*

- ***Product Rating:*** *From the results of this median calculation, we can be more certain that half of the data buyers of seblak products on Tokopedia are very happy with their product with a rating of 4.9 which almost touches the highest rating of 5.0. and the other half gave a rating below 4.9.*

**Standard Deviation Value:**
- ***Product Price:*** *From the calculation results above, it shows a very large standard deviation value, it can be ascertained that the price of seblak products sold on Tokopedia from each seller varies greatly.*

- ***Number Sold:*** *From the calculation results above, the standard deviation value is quite large, this shows that the sales performance of seblak products from each seller on Tokopedia is also quite varied, some are not good, some are quite good, and very good.*

- ***Product Rating:*** *The calculation results above show a low standard deviation value, this indicates that the rating of seblak products sold on Tokopedia is more or less uniform.*

**Skewness Value:**
- ***Product Price:*** *From the calculation results above, the skewness value is positive and shows that the data distribution is skewed to the right, this means that there are several seblak products with prices higher than the average, which results in long outliers to the right of the distribution.*

- ***Number Sold:*** *From the calculation results above, it shows a positive skewness value and shows that the data distribution is also skewed to the right with outliers, this means that the majority of seblak products sold on Tokopedia have a low number sold, while there are several products that have a very high number of sales.*

- ***Product Rating:*** *The calculation results above show a negative skewness value, this means that the data distribution is skewed to the left, and shows that most products have a high rating, and there are a few products that have a low rating .*

**Kurtosis Value:**
- ***Product Price:*** *The calculation results above show that the kurtosis value is greater than 3, this indicates that there may be significant outliers in the tail of the product price distribution.*

- ***Number Sold:*** *The calculation results above also show that the kurtosis value is greater than 3, this also indicates that there may be significant outliers in the tail of the distribution of the number of products sold.*

- ***Product Rating:*** *The calculation results above show a kurtosis value greater than 3, this also indicates that there is a possibility of concentration of product ratings around a certain value with thicker tails.*

In short, the data we have has a type of distribution that is not normal (not symmetrical) because the skewness value is very significant and the kurtosis value is greater than 3. And this data tends to have outliers.

### 2. Confidence Interval/Level

Here we will wonder what the minimum and maximum potential income and quantity sold would be if we wanted to sell seblak products on Tokopedia. In this step we will calculate the lower and upper confidence interval values.

In [64]:
#define several variables from the results of previous calculations.

mean_harga_produk = 18524.874053
std_harga_produk = 20510.598187
mean_jumlah_terjual = 531.629735
std_jumlah_terjual = 1590.019960
confidence_level = 0.95

#calculate confidence intervals using the normal distribution approach (z-score).

z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)

harga_produk_lower = mean_harga_produk - z_score * (std_harga_produk / (len(sc) ** 0.5))
harga_produk_upper = mean_harga_produk + z_score * (std_harga_produk / (len(sc) ** 0.5))

jumlah_terjual_lower = mean_jumlah_terjual - z_score * (std_jumlah_terjual / (len(sc) ** 0.5))
jumlah_terjual_upper = mean_jumlah_terjual + z_score * (std_jumlah_terjual / (len(sc) ** 0.5))

print(f"Confidence Interval untuk Harga Produk: [{harga_produk_lower:.2f}, sampai: {harga_produk_upper:.2f}]")
print(f"Confidence Interval untuk Jumlah Terjual per bulannya: [{jumlah_terjual_lower:.2f}, sampai: {jumlah_terjual_upper:.2f}]")


Confidence Interval untuk Harga Produk: [17287.80, sampai: 19761.94]
Confidence Interval untuk Jumlah Terjual per bulannya: [435.73, sampai: 627.53]


From the results above, we can calculate the approximate income that can be generated by selling seblak products on Tokopedia, starting from IDR 17,287 to IDR 19,761 per product.

And we can also estimate that the seblak products sold range from 435 pcs to 627 pcs per month.

Please remember, here we are using a confidence level of 95% and assuming that our data distribution is normal.

### 3. Is there a difference in the price of seblak products in Jabodetabek and outside Jabodetabek?

At this analysis stage, we will carry out Hypothesis Testing to answer the questions above.

Previously we had to determine H0 and H1 first:
- H0: There is no significant difference between the price of seblak products in the Jabodetabek area and areas outside Jabodetabek.
- H1: There is a significant difference between the prices of goods in the Jabodetabek area and areas outside Jabodetabek due to differences in raw material costs in the two locations.

Summoning a list of unique City Shops.

In [69]:
sc["Kota Toko"].unique()

array(['Kab. Sidoarjo', 'Depok', 'Bandung', 'Kab. Garut',
       'Tangerang Selatan', 'Jakarta Barat', 'Jakarta Pusat',
       'Jakarta Selatan', 'Surabaya', 'Kab. Bandung', 'Kab. Tangerang',
       'Kab. Sumedang', 'Jakarta Timur', 'Kab. Bogor', 'Jakarta Utara',
       'Bekasi', 'Surakarta', 'Tangerang', 'Kab. Bekasi',
       'Kab. Bandung Barat', 'Cimahi', 'Medan', 'Palembang',
       'Kab. Bantul', 'Kab. Tasikmalaya', 'Semarang', 'Tasikmalaya',
       'Pasuruan', 'Sukabumi', 'Kab.Ciamis', 'Kab. Purwakarta', 'Batam',
       'Malang', 'Kab. Jember', 'Kab. Sleman', 'Bogor', 'Kab. Brebes',
       'Kab. Karawang', 'Kab. Sukabumi', 'Denpasar', 'Kab. Majalengka',
       'Kab. Banyuwangi', 'Kab. Mojokerto', 'Banjar', 'Kab. Subang',
       'Kab. Malang', 'Kab. Indramayu', 'Banjarbaru', 'Cirebon'],
      dtype=object)

Perform filtering between cities that are included in Jabodetabek and outside Jabodetabek.

In [77]:
kota_jabodetabek = ["Depok", "Tangerang Selatan", "Jakarta Barat", 
"Jakarta Pusat", "Jakarta Selatan", "Kab. Tangerang", "Jakarta Timur", "Kab. Bogor", 
"Jakarta Utara", "Bekasi", "Tangerang", "Kab. Bekasi", "Bogor"]
jabodetabek = sc[sc['Kota Toko'].str.contains('|'.join(kota_jabodetabek))]
luar_jabodetabek = sc[~sc['Kota Toko'].str.contains('|'.join(kota_jabodetabek))]


Here we will use the T-test to test 2 numerical variables.

In [84]:
#conduct an independent t-test between product prices in the Jabodetabek area and outside the area.

harga_jabodetabek = []
harga_luar_jabodetabek = []

for index, row in sc.iterrows():
    kota = row["Kota Toko"]
    harga = row["Harga Produk"]
    
    if kota in kota_jabodetabek:
        harga_jabodetabek.append(harga)
    else:
        harga_luar_jabodetabek.append(harga)

t_statistic, p_value = stats.ttest_ind(harga_jabodetabek, harga_luar_jabodetabek)

alpha = 0.05

print("Nilai t-statistik:", t_statistic)
print("Nilai p-value:", p_value)

if p_value < alpha:
    print("Kita dapat menolak hipotesis null.")
    print("Terdapat perbedaan signifikan dalam harga barang antara Jabodetabek dan luar Jabodetabek.")
else:
    print("Tidak cukup bukti untuk menolak hipotesis null.")
    print("Tidak ada bukti signifikan dalam perbedaan harga produk antara Jabodetabek dan luar Jabodetabek.")

Nilai t-statistik: 0.03492829510350853
Nilai p-value: 0.9721435316209831
Tidak cukup bukti untuk menolak hipotesis null.
Tidak ada bukti signifikan dalam perbedaan harga produk antara Jabodetabek dan luar Jabodetabek.


From the Hypothesis Testing results above, it can be concluded that there is no significant price difference between the prices of seblak products in Jabodetabek and outside Jabodetabek.

### 4. Is it true that people prefer products that are cheaper?

To be able to answer this question, we need to do a correlation test.

In [89]:
#conduct a statistical test to see whether there is a relationship between the product price and the product rating in the dataframe.

corr_r, pval_p = stats.pearsonr(sc['Harga Produk'], sc['Rating Produk'])

print(corr_r)
print(pval_p)

if pval_p < 0.05:
    if corr_r > 0:
        print("Terdapat korelasi positif antara harga produk dan rating produk.")
    elif corr_r < 0:
        print("Terdapat korelasi negatif antara harga produk dan rating produk.")
    else:
        print("Tidak terdapat korelasi antara harga produk dan rating produk.")
else:
    print("Tidak dapat menyimpulkan adanya korelasi antara harga produk dan rating produk.")

-0.050982572471280276
0.0977517492193262
Tidak dapat menyimpulkan adanya korelasi antara harga produk dan rating produk.


From the results of the correlation test and p-value using the Pearson technique, the data shows that there is no correlation that people will prefer seblak products that are cheaper than others.

## E. Conclusion

From the results of the analysis carried out from the 4 points above, we can now conclude that sales of seblak products on Tokopedia are very promising in the future.

This seblak product's sales really helped because of the virality of this product from a celebrity in Indonesia. However, the average rating for seblak products that have been sold is very good at 4.5 and above, which means that consumers like this product not only because it is viral, but also that this seblak product is delicious and popular.

In terms of price, seblak products sold on Tokopedia are still very varied, and consumers are not too influenced by cheap or expensive product prices, because there is no correlation between product price and product rating.

In terms of sellers, this seblak product is not only popular in Jabodetabek, but is also popular in other cities. This shows again that this seblak product is indeed delicious and liked by Indonesian people in general, not only in big cities.

In terms of sales, many seblak products sell more than 500 pcs per month. This shows that the retention rate is good, and again reinforces the point that this seblak product sells not only because it goes viral. However, there are also many sellers who sell their products below 500 pcs per month.

In general, in the future we can start selling seblak products on Tokopedia, but we still have to look again at which cities have more consumers than others, so that we can determine the right location to sell. In terms of price, we can determine the price above average, but by making products that are tastier than others, more attractive packaging, and effective promotional techniques.