### <font color="brown">Problem Set 7: CSV and JSON Processing - Solution</font>

---

In [3]:
import csv
import json

---

#### Problem 1

You are given a file named *countries.txt*, in which the columns (in order) refer to country id, country name, region id, and average population (in thousands). Use DictReader to read this file and calculate the average population of countries belonging to region 1. Ignore the countries where average population is not available ('NA'/'na'). 

#### Solution

In [4]:
popSum = 0
popCnt = 0
with open('countries.txt') as csvfile:
    data = csv.DictReader(csvfile, delimiter = ' ', fieldnames=['countryID','countryName','regionID','countryPopulation'])
    for row in data:
        values = list(row.values())
        if 'NA' in values or 'na' in values:
                continue
        print(','.join(values))
        if int(row['regionID']) == 1:
            popSum += int(row['countryPopulation'])
            popCnt += 1
print("Average population of countries in region 1:", popSum/popCnt)

AR,Argentina,2,134
AU,Australia,3,345
BE,Belgium,1,167
BR,Brazil,2,123
CH,Switzerland,1,234
CN,China,3,1120
DE,Germany,1,230
DK,Denmark,1,768
FR,France,1,789
HK,HongKong,3,367
IL,Israel,4,672
IN,India,3,1000
IT,Italy,1,345
JP,Japan,3,345
MX,Mexico,2,657
NG,Nigeria,4,134
N,Netherlands,1,768
UK,UnitedKingdom,1,368
US,UnitedStatesofAmerica,2,440
ZM,Zambia,4,782
Average population of countries in region 1: 458.625


---

#### Problem 2

Read the given CSV file *departments.csv*, fill in all empty managerID fields with 000, and write the updated file to *departments_upd.csv*. Use standard CSV reader and writer. Display the department with highest average salary.

#### Solution

In [7]:
maxAvgSalary = 0
departmentName = ""
with open('departments.csv') as csvfile:
    reader = csv.reader(csvfile)
   
    with open('departments_upd.csv','w',newline='') as csvout:
        writer = csv.writer(csvout)
        
        columns = next(reader)  
        writer.writerow(columns)
        
        for row in reader:
            if len(row[2].strip()) == 0:
                row[2] = '000'
            avg_salary = int(row[3])
            if avg_salary > maxAvgSalary :
                maxAvgSalary = avg_salary
                departmentName = row[1]
            writer.writerow(row)

print("Department with highest salary is:", departmentName)

Department with highest salary is: Public Relations


---

#### Problem 3

Repeat the above problem using DictReader and DictWriter. Also list department ids with the most common average salary value.

#### Solution

In [10]:
from collections import Counter

maxAvgSalary = 0
departmentName = ""
salaries = Counter()
departmentsWithSal = []
with open('departments.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    
    with open('departments_upd.csv','w',newline='') as csvout:
        writer = csv.DictWriter(csvout,fieldnames=reader.fieldnames)  
       
        writer.writeheader()  
        
        for row in reader:
            if len(row['manager_id'].strip()) == 0:
                row['manager_id'] = '000'
            avg_salary = int(row['avg_salary'])    
            if avg_salary > maxAvgSalary:
                maxAvgSalary = avg_salary
                departmentName = row['department_name']
            writer.writerow(row)

            # following is for most common salary determination
            salaries.update([avg_salary])
            departmentsWithSal.append((row['department_name'],avg_salary))

most_common_salary = salaries.most_common(1)[0][0]
most_common_salary_depts = [dep for dep,sal in departmentsWithSal if sal == most_common_salary]

print("Department with highest salary is:", departmentName)
print(f"Departments with most common average salary {most_common_salary}: {most_common_salary_depts}")

Department with highest salary is: Public Relations
Departments with most common average salary 1700: ['Administration', 'Purchasing', 'Executive', 'Finance', 'Accounting', 'Treasury', 'Corporate Tax', 'Control And Credit', 'Shareholder Services', 'Benefits', 'Manufacturing', 'Construction', 'Contracting', 'Operations', 'IT Support', 'NOC', 'IT Helpdesk', 'Government Sales', 'Retail Sales', 'Recruiting', 'Payroll']


---

#### Problem 4

Given the following dictionary, convert it into a JSON string, sorted by key, with indent level 4.

In [5]:
person_dict = {
  "name": "John",
  "age": 30,
  "married": True,
  "divorced": False,
  "children": ("Ann","Billy"),
  "pets": None,
  "cars": [
    {"model": "BMW 230", "mpg": 27.5},
    {"model": "Ford Edge", "mpg": 24.1}
  ]
}

#### Solution

In [7]:
json_str = json.dumps(person_dict, sort_keys=True, indent=4)
print(json_str)

{
    "age": 30,
    "cars": [
        {
            "model": "BMW 230",
            "mpg": 27.5
        },
        {
            "model": "Ford Edge",
            "mpg": 24.1
        }
    ],
    "children": [
        "Ann",
        "Billy"
    ],
    "divorced": false,
    "married": true,
    "name": "John",
    "pets": null
}


---

#### Problem 5

Parse the following JSON string to get all the values of the key ‘name’. The result should be a list. 

In [14]:
sampleJson = """[ 
   { 
      "id":1,
      "name":"name1",
      "color":[ 
         "red",
         "green"
      ]
   },
   { 
      "id":2,
      "name":"name2",
      "color":[ 
         "pink",
         "yellow"
      ]
   }
]"""

#### Solution

In [15]:
data = []
try:
    data = json.loads(sampleJson)
except Exception as e:
    print(e)

dataList = [item.get('name') for item in data]
print(dataList)

['name1', 'name2']


---

#### Problem 6

For the given JSON string, do the following:

1. Print the total payable amount for employee "Emma". (Total Payable = Salary + Bonus)
2. Get the average salary for all the employees

In [17]:
sampleJson = """{ 
   "company":{ 
      "employee":[
          {
         "name":"emma",
         "payable":{ 
            "salary":7000,
            "bonus":800
         }},
         {
         "name":"josh",
         "payable":{ 
            "salary":9000,
            "bonus":100
         }},
         {
         "name":"jake",
         "payable":{ 
            "salary":5000,
            "bonus":200
         }
      }
   ]
}}"""

#### Solution

In [18]:
data = json.loads(sampleJson)
emp_emma = data['company']['employee'][0]
emma_payble = emp_emma['payable']['salary'] + emp_emma['payable']['bonus']
print("Total payable of employee Emma:", emma_payble)

employees = data['company']['employee']
sal = 0
for emp in employees:
    sal += emp['payable']['salary']
print("Average salary:", sal/len(employees))

Total payable of employee Emma: 7800
Average salary: 7000.0


---

#### Problem 7

Check whether following json is valid or invalid. If Invalid correct it. <br>
Hint: Check json.tool (https://docs.python.org/3/library/json.html#module-json.tool)

<pre>
sampleJson = { 
   "company":{ 
      "employee":{ 
         "name":"emma",
         "payble":{ 
            "salary":7000
            "bonus":800
         }
      }
   }
}
</pre>

#### Solution

**Option 1, using command line:** <br>

echo { "company":{ "employee":{ "name":"emma", "payble":{ "salary":7000 "bonus":800} } } } | python -m json.tool

Output: parse error near `}'<br>

Solution: Add ',' after salary

**Option 2:**

In [21]:
def isValidJSON(jsonData):
    try:
        json.loads(jsonData)
    except ValueError as err:
        return False
    return True

InvalidJsonData = """{ "company":{ "employee":{ "name":"emma", "payble":{ "salary":7000 "bonus":800} } } }"""
isValid = isValidJSON(InvalidJsonData)
print(isValid)

False
