# Data Analysis Basics
**데이터 분석 기초**

**Duration (수업 시간)**: 3 hours (3시간)  
**Structure (구성)**: Lecture & Lab 2 hours + Quiz 1 hour (강의 및 실습 2시간 + 퀴즈 1시간)  
**Level (수준)**: Intermediate (중급)

---

## Learning Objectives (학습 목표)

By the end of this lesson, students will be able to:
이 수업을 마친 후 학생들은 다음을 할 수 있습니다:

- Calculate basic statistics from data (데이터에서 기본 통계 계산)
- Filter data based on simple conditions (간단한 조건으로 데이터 필터링)
- Group data and find totals (데이터 그룹화 및 총계 구하기)

---

## 1. What is Data Analysis? (데이터 분석이란?)

Data analysis means looking at numbers to find answers. Like counting how many students passed a test.
데이터 분석은 답을 찾기 위해 숫자를 보는 것을 의미합니다. 시험에 합격한 학생 수를 세는 것과 같습니다.

### Simple Example (간단한 예시)

In [None]:
# Test scores
scores = [85, 92, 78, 96, 83]

# Basic analysis
total = sum(scores)
count = len(scores)
average = total / count

print(f"Number of students: {count}")
print(f"Average score: {average}")
print(f"Highest score: {max(scores)}")
print(f"Lowest score: {min(scores)}")

---

## 2. Basic Statistics (기본 통계)

The most important numbers you need to know:
알아야 할 가장 중요한 숫자들:

- **Sum**: Add all numbers (합계: 모든 숫자 더하기)
- **Average**: Sum divided by count (평균: 합계를 개수로 나누기)
- **Maximum**: Biggest number (최대값: 가장 큰 숫자)
- **Minimum**: Smallest number (최소값: 가장 작은 숫자)

### Simple Statistics Function (간단한 통계 함수)

In [None]:
def calculate_stats(numbers):
    if len(numbers) == 0:
        return "No data"
    
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    
    return {
        'total': total,
        'count': count,
        'average': round(average, 1),
        'max': max(numbers),
        'min': min(numbers)
    }

# Test it
grades = [88, 92, 76, 85, 90]
stats = calculate_stats(grades)

for key, value in stats.items():
    print(f"{key}: {value}")

---

## 3. Filtering Data (데이터 필터링)

Filtering means selecting only the data you want. Like finding all students who scored above 80.
필터링은 원하는 데이터만 선택하는 것을 의미합니다. 80점 이상 받은 모든 학생을 찾는 것과 같습니다.

### Basic Filtering (기본 필터링)

In [None]:
# Student data
students = [
    {'name': 'Alice', 'grade': 92},
    {'name': 'Bob', 'grade': 78},
    {'name': 'Carol', 'grade': 85},
    {'name': 'David', 'grade': 95}
]

# Find students with grade >= 85
good_students = []
for student in students:
    if student['grade'] >= 85:
        good_students.append(student)

print("Students with grade 85 or higher:")
for student in good_students:
    print(f"{student['name']}: {student['grade']}")

---

## 4. Grouping Data (데이터 그룹화)

Grouping means putting similar things together and counting them.
그룹화는 비슷한 것들을 함께 모아서 개수를 세는 것을 의미합니다.

### Simple Grouping Example (간단한 그룹화 예시)

In [None]:
# Sales by month
sales = [
    {'month': 'January', 'amount': 1000},
    {'month': 'February', 'amount': 1500},
    {'month': 'January', 'amount': 800},
    {'month': 'February', 'amount': 1200}
]

# Group by month
monthly_totals = {}
for sale in sales:
    month = sale['month']
    amount = sale['amount']
    
    if month in monthly_totals:
        monthly_totals[month] += amount
    else:
        monthly_totals[month] = amount

print("Monthly totals:")
for month, total in monthly_totals.items():
    print(f"{month}: ${total}")

---

## Lab Exercises (실습)

### Lab 1: Student Grade Analysis (학생 성적 분석)

**Problem**: Calculate statistics for student grades.
문제: 학생 성적의 통계를 계산하세요.

**Data:**

In [None]:
grades = [88, 92, 76, 95, 83, 89, 91, 87]

**Solution**:

In [None]:
grades = [88, 92, 76, 95, 83, 89, 91, 87]

# Calculate basic statistics
total = sum(grades)
count = len(grades)
average = total / count

print(f"Total students: {count}")
print(f"Average grade: {average:.1f}")
print(f"Highest grade: {max(grades)}")
print(f"Lowest grade: {min(grades)}")

# Count grades above average
above_average = 0
for grade in grades:
    if grade > average:
        above_average += 1

print(f"Students above average: {above_average}")

### Lab 2: Sales Data Analysis (판매 데이터 분석)

**Problem**: Find monthly sales totals and best month.
문제: 월별 판매 총계와 최고의 달을 찾으세요.

**Data:**

In [None]:
sales = [
    {'month': 'Jan', 'sales': 5000},
    {'month': 'Feb', 'sales': 6000},
    {'month': 'Jan', 'sales': 3000},
    {'month': 'Feb', 'sales': 4000},
    {'month': 'Mar', 'sales': 7000}
]

**Solution**:

In [None]:
sales = [
    {'month': 'Jan', 'sales': 5000},
    {'month': 'Feb', 'sales': 6000},
    {'month': 'Jan', 'sales': 3000},
    {'month': 'Feb', 'sales': 4000},
    {'month': 'Mar', 'sales': 7000}
]

# Calculate monthly totals
monthly_totals = {}
for sale in sales:
    month = sale['month']
    amount = sale['sales']
    
    if month in monthly_totals:
        monthly_totals[month] += amount
    else:
        monthly_totals[month] = amount

# Display results
print("Monthly sales:")
for month, total in monthly_totals.items():
    print(f"{month}: ${total}")

# Find best month
best_month = ""
best_sales = 0
for month, total in monthly_totals.items():
    if total > best_sales:
        best_sales = total
        best_month = month

print(f"Best month: {best_month} with ${best_sales}")

### Lab 3: Survey Analysis (설문조사 분석)

**Problem**: Count survey responses and calculate satisfaction average.
문제: 설문 응답을 세고 만족도 평균을 계산하세요.

**Data:**

In [None]:
responses = [
    {'age': '18-25', 'satisfaction': 4},
    {'age': '26-35', 'satisfaction': 5},
    {'age': '18-25', 'satisfaction': 3},
    {'age': '26-35', 'satisfaction': 4}
]

**Solution**:

In [None]:
responses = [
    {'age': '18-25', 'satisfaction': 4},
    {'age': '26-35', 'satisfaction': 5},
    {'age': '18-25', 'satisfaction': 3},
    {'age': '26-35', 'satisfaction': 4}
]

# Calculate overall satisfaction
total_satisfaction = 0
count = 0
for response in responses:
    total_satisfaction += response['satisfaction']
    count += 1

average_satisfaction = total_satisfaction / count

print(f"Total responses: {count}")
print(f"Average satisfaction: {average_satisfaction:.1f}/5")

# Count by age group
age_counts = {}
for response in responses:
    age = response['age']
    if age in age_counts:
        age_counts[age] += 1
    else:
        age_counts[age] = 1

print("Responses by age group:")
for age, count in age_counts.items():
    print(f"{age}: {count} responses")

---

## Quiz Section (퀴즈)

### Quiz 1: Basic Statistics

**Question**: Calculate average, maximum, and minimum from this list: [75, 88, 92, 67, 83, 90]. Also count how many scores are above 80.

이 리스트에서 평균, 최대값, 최소값을 계산하세요: [75, 88, 92, 67, 83, 90]. 또한 80점 이상인 점수가 몇 개인지 세어보세요.

**Write your answer here (답을 여기에 작성하세요)**:

In [None]:
# Your code here

### Quiz 2: Data Filtering

**Question**: From the list of products below, find and print only products with price less than $50.

아래 제품 리스트에서 가격이 $50 미만인 제품만 찾아서 출력하세요.

In [None]:
products = [
    {'name': 'Book', 'price': 25},
    {'name': 'Pen', 'price': 5},
    {'name': 'Laptop', 'price': 800},
    {'name': 'Mouse', 'price': 30}
]

**Write your answer here (답을 여기에 작성하세요)**:

In [None]:
# Your code here

### Quiz 3: Data Grouping

**Question**: Calculate total sales for each salesperson from the data below.

아래 데이터에서 각 판매원의 총 판매액을 계산하세요.

In [None]:
sales_data = [
    {'person': 'Alice', 'amount': 100},
    {'person': 'Bob', 'amount': 150},
    {'person': 'Alice', 'amount': 200},
    {'person': 'Bob', 'amount': 75}
]

**Write your answer here (답을 여기에 작성하세요)**:

In [None]:
# Your code here

---

## References (참고)

1. **Python Data Basics**: https://docs.python.org/3/tutorial/datastructures.html
2. **Simple Statistics**: https://www.programiz.com/python-programming/statistics
3. **Data Processing Tutorial**: https://realpython.com/python-data-analysis/

---

## Key Points (핵심 포인트)

### Remember (기억하세요)
1. **Basic stats: sum, average, max, min** (기본 통계: 합계, 평균, 최대, 최소)
2. **Filtering: select data that meets conditions** (필터링: 조건을 만족하는 데이터 선택)
3. **Grouping: put similar things together** (그룹화: 비슷한 것들을 함께 모으기)

### Next Week Preview (다음 주 미리보기)
Next week: **Matplotlib Basics** - Making simple charts and graphs
다음 주: **Matplotlib 기초** - 간단한 차트와 그래프 만들기

---

## Homework (숙제)

1. Complete all three lab exercises (3개 실습 모두 완료)
2. Practice with your own simple data (자신만의 간단한 데이터로 연습)
3. Try calculating statistics for different types of numbers (다른 종류의 숫자로 통계 계산 시도)

**Data analysis helps you understand your data better!**
**데이터 분석은 데이터를 더 잘 이해하는 데 도움이 됩니다!**