### **Understanding Data**

#### What is Data?
Data is information that is recorded in the form of numbers, words, or facts.
We can only call something "data" if it is written down, saved, or stored in some form. We use data to understand problems, answer questions, and make decisions.

`Is a photo considered data?`

#### Data Type
1. Structured
2. Semi-structured
3. Unstructured

#### Data Lifecycle
1. Generation
2. Collection
3. Processing
4. Storage
5. Management
6. Analysis
7. Visualization
8. Interpretation
9. Taking action/decision

Data lifecycle example:
Sensor vehicles counter upload data to server → User send API requests → Convert JSON to column-row format → Data stored in PostgreSQL → Create a role in PostgreSQL, grant permission to SELECT but no INSERT, UPDATE, DELETE → Analyst find the peak traffic hours → Create a dashboard → Taking an action or decision


### **Data-Driven Decision Making**

#### Data Mining Technique
- Association  →  Find co-occurrence or dependency rules in data	→ Market Basket Analysis
- Classification     →      Assign data into categories         → Email Spam Detection
- Regression         →      Predict a continuous number         → House Price Prediction
- Clustering         →      Group similar data without labels   → Customer Segmentation
- Recommendation     →  	Suggest items based on patterns       → Product Recommendations
- Anomaly Detection  →	Find unusual patterns                   → Fraud Detection

#### Real World Example
1. Netflix  
  **Problem**: Faced high risks in producing original content without guaranteed audience interest.  
  <br>
  **Action**: Analyzed user viewing habits to identify preferences for political dramas, interest in actor Kevin Spacey, and appreciation for director David Fincher's work.  
  <br>
  **Decision**: Invested $100 million to produce two seasons of House of Cards without creating a pilot episode.  
  <br>
  **Result**: House of Cards became a major success, attracting over 3 million new subscribers within two months of its release. [Read full article](https://www.prequateadvisory.com/post/house-of-cards-was-no-fluke-it-was-a-data-backed-master-move)

2. Starbucks  
  **Problem**: Sought to enhance customer loyalty and increase sales.  
  <br>
  **Action**: Implemented the Deep Brew AI platform to analyze customer data, enabling personalized offers and optimizing store locations.  
  <br>
  **Decision**: Launched targeted promotions through the Starbucks mobile app, tailoring rewards to individual customer preferences.  
  <br>
  **Result**: Achieved a 30% return on investment (ROI) and a 15% increase in customer engagement levels compared to previous marketing methods. [Read full article](https://www.theaireport.ai/articles/how-starbucks-uses-ai-to-make-a-30-roi)


### **Work with Data**

### Data Source
1. Data Collection: survey, sensor, scraping, documents & records
2. Public Dataset:
    - [Kaggle](https://www.kaggle.com/datasets)
    - [Google](https://datasetsearch.research.google.com/)
    - [UCI Machine Learning Repository](https://archive.ics.uci.edu/)
    - [US Federal Government](https://data.gov/)
    - [Satu Data Indonesia](https://data.go.id/)

### Data Tools
- Programming language: R & Python
- Spreadsheet: Excel & Google Sheets
- Desktop App: RapidMiner, KNIME, Orange
- Code Editor: Jupyter Notebook & Google Colab
- Databases: MySQL / PostgreSQL, BigQuery / Snowflake
- Visualization: Tableau, Power BI, Google Looker Studio

### **Reflection**
You already understand the types of things we can do with data. As a data analyst, what skills should you acquire? Find out by researching industry needs directly through job portals or freelance platforms.

(answer here)

### **Exploration**
We will walk through the essential data analyst skills using Python. Pandas is a powerful library for working with data in Python. Please review this module: [Pandas 10-Minute Guide](https://pandas.pydata.org/docs/user_guide/10min.html) before our next session.