## Get number of companies per sector

Develop the function to get the number of companies per sector. As part of this Spark Practice Test, we are primarily evaluating the ability to group the data in Spark Data Frame by a key and perform aggregations.
* The input Data Frame contains the name of company and it's sector.
* The output Data Frame should contain sector as key and number of companies per sector as it's values. 
* The output Data Frame should be sorted in ascending order by sector name.
* Also the second column in the output data frame should be **company_count**.

In [0]:
companies=[
    ('Accenture', 'IT'),
    ('Apple', 'IT'),
    ('Adobe Systems Inc', 'IT'),
    ('Alphabet', 'IT'),
    ('Bank of America Corp', 'Financials'),
    ('Biogen Inc', 'Health Care'),
    ('Campbell Soup', 'Consumer Staples'),
    ('Dr Pepper Snapple Group', 'Consumer Staples'),
    ('ebay Inc', 'IT'),
    ('FedEx Corporation', 'Industrials'),
    ('Ford Motors', 'Consumer Products'),
    ('General Motors', 'Consumer Products'),
    ('Harley-Davidson', 'Consumer Products'),
    ('Hewlett Packard Enterprise', 'IT'),
    ('Intel Corp', 'IT'),
    ('JP Morgan', 'Financials'),
    ('Johnson & Johnson', 'Health Care'),
    ('Microsft Corp', 'IT'),
    ('Netflix Inc', 'IT'),
    ('Nike', 'Consumer Products')
]

companies_df = spark.createDataFrame(companies, schema='company_name STRING, sector STRING')

### Step 1: Preview the data

Let us first preview the data.

In [0]:
display(companies_df)

company_name,sector
Accenture,IT
Apple,IT
Adobe Systems Inc,IT
Alphabet,IT
Bank of America Corp,Financials
Biogen Inc,Health Care
Campbell Soup,Consumer Staples
Dr Pepper Snapple Group,Consumer Staples
ebay Inc,IT
FedEx Corporation,Industrials


In [0]:
companies_df.count()

### Step 2: Provide the solution

Now come up with the solution by developing the required logic. Once the function is developed, go to the next step to take care of the validation.

In [0]:
from pyspark.sql.functions import col, count
def get_company_count_per_sector(companies_df):
    # Develop your logic here
    company_count_per_sector = companies_df. \
        groupBy('sector'). \
        agg(count("*").alias('company_count')). \
        orderBy('sector')
    return company_count_per_sector

### Step 3: Validate the function

Let us validate the function by running below cells.
* Here is the expected output.

```python
[{'sector': 'Consumer Products', 'company_count': 4},
 {'sector': 'Consumer Staples', 'company_count': 2},
 {'sector': 'Financials', 'company_count': 2},
 {'sector': 'Health Care', 'company_count': 2},
 {'sector': 'IT', 'company_count': 9},
 {'sector': 'Industrials', 'company_count': 1}]
```

In [0]:
company_count_per_sector = get_company_count_per_sector(companies_df)

In [0]:
display(company_count_per_sector)

sector,company_count
Consumer Products,4
Consumer Staples,2
Financials,2
Health Care,2
IT,9
Industrials,1


In [0]:
company_count_per_sector.count() # 6

In [0]:
company_count_per_sector.toPandas().to_dict(orient='records')