Perfect! Below are **data engineer-focused Python problem-solving questions** with **realistic datasets (as copy-paste-friendly input)**. Each question includes a small sample dataset so you can copy and run directly in your editor or Jupyter.

---

### 🟢 Beginner Level

---

#### 1. **Read a CSV File and Print Column Names**

**📝 Task:** Read a CSV and print all column names.

**📄 Sample `sample.csv`:**

```csv
id,name,age,department
101,Navin,28,Engineering
102,Raj,32,HR
103,Sneha,25,Finance
```

---

#### 2. **Count Word Frequency in a Text File**

**📝 Task:** Count how many times each word appears.

**📄 Sample `words.txt`:**

```
data is the new oil
data drives decisions
oil and data are valuable
```

---

#### 3. **Convert JSON Data to CSV**

**📝 Task:** Convert JSON to CSV.

**📄 Sample `data.json`:**

```json
[
  {"id": 1, "name": "Navin", "age": 28},
  {"id": 2, "name": "Priya", "age": 26}
]
```

---

#### 4. **Extract Specific Columns from a CSV**

**📝 Task:** From the same `sample.csv`, extract only `name` and `department`.

---

### 🟡 Intermediate Level

---

#### 5. **Group Sales Data by Region**

**📝 Task:** Calculate total sales per region.

**📄 Sample `sales.csv`:**

```csv
region,product,sales
East,Mobile,200
West,Laptop,400
East,Laptop,100
South,Mobile,300
West,Mobile,150
```

---

#### 6. **Flatten Nested JSON**

**📝 Task:** Flatten into row-wise structure.

**📄 Sample JSON (inline string):**

```json
{
  "id": 1,
  "name": "Navin",
  "address": {
    "city": "Delhi",
    "pincode": "110011"
  }
}
```

---

#### 7. **Paginate List of Records**

**📝 Task:** Break the list into pages of `n` records each.

**📄 Sample Data (Python list):**

```python
records = [
    {"id": 1, "name": "Navin"},
    {"id": 2, "name": "Sneha"},
    {"id": 3, "name": "Karan"},
    {"id": 4, "name": "Riya"},
    {"id": 5, "name": "Amit"}
]
```

---

#### 8. **Split Large CSV into Smaller Chunks**

**📝 Task:** Split every 2 lines into separate files.

**📄 Sample `large.csv`:**

```csv
id,name
1,Navin
2,Sneha
3,Karan
4,Riya
5,Amit
```

---

### 🔴 Advanced Level

---

#### 9. **Top K Frequent Words**

**📝 Task:** Find top 2 most frequent words.

**📄 Sample `sentences.txt`:**

```
data data spark cloud
data cloud cloud data
cloud spark data
```

---

#### 10. **Join Two Lists of Dicts (like SQL JOIN)**

**📝 Task:** Perform inner join on `id`.

**📄 Sample Data:**

```python
employees = [
    {"id": 1, "name": "Navin"},
    {"id": 2, "name": "Sneha"},
    {"id": 3, "name": "Karan"}
]

salaries = [
    {"id": 1, "salary": 60000},
    {"id": 3, "salary": 55000}
]
```

---

Let me know which topic you'd like to start practicing — for example, [CSV parsing](f), [JSON flattening](f), [data joining](f), or [text analysis](f) — and I’ll give you full working code templates with solutions.


In [4]:
# Q1. read csv file print all column names.
with open("files\sample.csv", "r") as file:
    st=file.readline()
    srr=st.strip().split(",")
    print(srr)


['id', 'name', 'age', 'department']


In [5]:
#q2 count word frequency
with open("files/words.txt") as r:
    mp=dict()
    for x in r:
        arr=x.strip().split(" ")
        for y in arr:
            mp[y]=mp.get(y, 0)+1
    
    print(mp)

{'data': 3, 'is': 1, 'the': 1, 'new': 1, 'oil': 2, 'drives': 1, 'decisions': 1, 'and': 1, 'are': 1, 'valuable': 1}


In [None]:
# q3. convert json file to csv file write 
import json
import csv

with open("files/data.json") as jr, open("files/data_csv.csv", "w", newline="") as wf:
    st=json.load(jr)
    isHeader=True
    
    header=st[0].keys()
    wrt=csv.DictWriter(wf, header)
    wrt.writeheader()
    wrt.writerows(st)

    

In [29]:
import csv 

with open("files/sample.csv", "r") as csv_file:
    str=csv.reader(csv_file)
    isHeader=True
    name_index=-1
    depa_index=-1
    for row in str:
        #print(row)
       # print(name_index, depa_index)
        if isHeader:
            name_index=row.index("name")
            depa_index=row.index("department")
            #print(name_index, depa_index)
            isHeader=False
            print(row[name_index],",", row[depa_index])            
        else:
            print(row[name_index],",", row[depa_index])
        

0
name , department
Navin , Engineering
Raj , HR
Sneha , Finance


In [33]:
# q5 group sales data by region

import csv

with open("files/sales.csv") as csv_file:
    mp=dict()
    st=csv.reader(csv_file)
    isHeader=True
    region_index=-1
    sales_index=-1
    for row in st:
        if isHeader:
            region_index=row.index("region")
            sales_index=row.index("sales")
            isHeader=False
        else:
            mp[row[region_index]]= mp.get(row[region_index],0) + int(row[sales_index])
    
    print(mp)

    

{'East': 300, 'West': 550, 'South': 300}


In [36]:
# q6 flattern nested json print row by row

import json
with open("files/data_json.json") as json_read:
    st=json.load(json_read)

    print(st.keys())
    print(st.values())


dict_keys(['id', 'name', 'address'])
dict_values([1, 'Navin', {'city': 'Delhi', 'pincode': '110011'}])
