
# **1. Introduction to Python**
Python is a widely used programming language for data analysis, statistics, and visualization. 
Python supports many popular libraries such as NumPy, pandas, Matplotlib, and scikit-learn for data analysis and machine learning needs.



# **2. Data Types in Python**
## **a. List**
A list is the simplest data type and can store elements of different types.


In [1]:

# Creating a list
numbers = [1, 2, 3, 4]
text = ["A", "B", "C", "D"]
logic = [True, False, True]

print(numbers)
print(text)
print(logic)

# Operations on lists
print(numbers * 2)  # Repeat elements
print(numbers[1:3])  # Slice elements


[1, 2, 3, 4]
['A', 'B', 'C', 'D']
[True, False, True]
[1, 2, 3, 4, 1, 2, 3, 4]
[2, 3]



## **b. Array (NumPy)**
An array is a two-dimensional data structure that supports mathematical operations.


In [2]:

import numpy as np

# Creating an array
mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat)

# Operations on arrays
print(mat.shape)  # Size of array
print(mat[:, 1])  # Get second column


[[1 2 3]
 [4 5 6]]
(2, 3)
[2 5]



## **c. Data Frame (pandas)**
A DataFrame is a two-dimensional data structure from the pandas library, used for data manipulation.


In [3]:

import pandas as pd

# Creating a DataFrame
data = pd.DataFrame({
    "Name": ["Andi", "Budi"],
    "Age": [25, 30]
})
print(data)


   Name  Age
0  Andi   25
1  Budi   30



# **3. Reading Data into Python**
## **a. Excel**
Install the library first, using openpyxl or pandas.


In [4]:

# !pip install pandas openpyxl

# Reading an Excel file
data_excel = pd.read_excel("Data_excel_mainan.xlsx")
print(data_excel.head())


FileNotFoundError: [Errno 2] No such file or directory: 'Data_excel_mainan.xlsx'


## **b. CSV**


In [None]:

# Reading a CSV file
data_csv = pd.read_csv("Data_csv_mainan.csv")
print(data_csv.head())



# **4. DataFrame Manipulation**
## **a. Selecting Columns from a DataFrame**
Use the column name or index.


In [None]:

# Selecting the "Name" column
print(data_excel["Name"])

# Selecting the second column
print(data_csv.iloc[:, 1])



## **b. Merging Columns**
Use the `pd.concat()` function to merge columns.


In [None]:

# Adding a new column
data_excel["Score"] = [100, 85]
print(data_excel)



# **5. Saving Data from Python**


In [None]:

data_excel.to_excel("data.xlsx", index=False)  # Excel
data_excel.to_csv("data.csv", index=False)  # CSV



# **6. Libraries for Regression Analysis**
Some popular libraries:
* statsmodels: For linear regression analysis and other statistical models
* scikit-learn: For machine learning, including regression
* seaborn: For statistical visualization



# **7. Calculating Correlation**


In [None]:

price = [245, 312, 279, 308, 199, 219, 405, 324, 319, 255]
area = [1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700]
house_data = pd.DataFrame({"price": price, "area": area})

# Calculating correlation
print(house_data.corr())

# Visualizing correlation
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(house_data.corr(), annot=True, cmap="coolwarm")
plt.show()



# **8. Creating a Scatter Plot**


In [None]:

plt.scatter(house_data["area"], house_data["price"])
plt.title("Scatter Plot Price vs Floor Area")
plt.xlabel("Floor Area (m^2)")
plt.ylabel("House Price (Million Rp)")
plt.show()

# Using statsmodels for linear regression
import statsmodels.api as sm

# Regression model
X = sm.add_constant(house_data["area"])
y = house_data["price"]
model = sm.OLS(y, X).fit()

# Plot scatter plot with regression line
plt.scatter(house_data["area"], house_data["price"], label="Data")
plt.plot(house_data["area"], model.predict(X), color="red", label="Regression Line")
plt.title("Scatter Plot with Regression Line")
plt.xlabel("House Area")
plt.ylabel("House Price")
plt.legend()
plt.show()
