In [1]:
import pandas as pd
import numpy as np
import scipy as sp

# The Fundamentals of Data Analytics

## descriptive, predictive, and prescriptive

1. **Descriptive Analytics**

It tells you what has happened. It can be done using an exploratory data analysis.



2. **Predictive Analytics**

It tells you what will happen. It can be achieved by building predictive models.


3. **Prescriptive Analytics**

It tells you how to make something happen. It can be done by deriving key insights and hidden patterns from the data.

## Data collection

Data collection is the natural first step for any data analysis—we can't analyze data we
don't have. In reality, our analysis can begin even before we have the data. When we
decide what we want to investigate or analyze, we have to think about what kind of data
we can collect that will be useful for our analysis.Data Sources can be anything some of the examples are given below.

- Web scraping to extract data from a website's HTML (often with Python packages
such as selenium, requests, scrapy, and beautifulsoup)

- Application programming interfaces (APIs) for web services from which we
can collect data with HTTP requests (perhaps using cURL or the requests
Python package)

- Databases (data can be extracted with SQL or another database-querying language)

- Internet resources that provide data for download, such as government websites
or Yahoo! Finance

- Log files

We are surrounded by data, so the possibilities are limitless. It is important, however,
to make sure that we are collecting data that will help us draw conclusions. For example,
if we are trying to determine whether hot chocolate sales are higher when the temperature
is lower, we should collect data on the amount of hot chocolate sold and the temperatures
each day. While it might be interesting to see how far people traveled to get the hot
chocolate, it's not relevant to our analysis.

- problem statement(Understanding the Business Problem)
- Data Collection

## Data wrangling

Data wrangling is the process of preparing the data and getting it into a format that can
be used for analysis. The unfortunate reality of data is that it is often dirty, meaning that
it requires cleaning (preparation) before it can be used. The following are some issues we
may encounter with our data:

- Human errors: Data is recorded (or even collected) incorrectly, such as putting 100
instead of 1000, or typos. In addition, there may be multiple versions of the same
entry recorded, such as New York City, NYC, and nyc.

- Computer error: Perhaps we weren't recording entries for a while (missing data).

- Unexpected values: Maybe whoever was recording the data decided to use
a question mark for a missing value in a numeric column, so now all the entries
in the column will be treated as text instead of numeric values

- Incomplete information: Think of a survey with optional questions; not everyone
will answer them, so we will have missing data, but not due to computer or
human error.

- Resolution: The data may have been collected per second, while we need hourly
data for our analysis

- Relevance of the fields: Often, data is collected or generated as a product of some
process rather than explicitly for our analysis. In order to get it to a usable state,
we will have to clean it up.

- Format of the data: Data may be recorded in a format that isn't conducive to
analysis, which will require us to reshape it.

- Misconfigurations in the data-recording process: Data coming from sources such
as misconfigured trackers and/or webhooks may be missing fields or passed in the
wrong order.

Most of these data quality issues can be remedied, but some cannot, such as when the
data is collected daily and we need it on an hourly resolution. It is our responsibility
to carefully examine our data and handle any issues so that our analysis doesn't get
distorted

## Exploratory data analysis

During EDA, we use visualizations and summary statistics to get a better understanding
of the data. Since the human brain excels at picking out visual patterns, data visualization
is essential to any analysis. In fact, some characteristics of the data can only be observed
in a plot. Depending on our data, we may create plots to see how a variable of interest
has evolved over time, compare how many observations belong to each category, find
outliers, look at distributions of continuous and discrete variables, and much more


`*Important note*
Data visualizations are very powerful; unfortunately, they can often be
misleading. One common issue stems from the scale of the y-axis because
most plotting tools will zoom in by default to show the pattern up close.
It would be difficult for software to know what the appropriate axis limits are
for every possible plot; therefore, it is our job to properly adjust the axes before
presenting our results. You can read about some more ways that plots can
be misleading at https://venngage.com/blog/misleadinggraphs/.`

In the workflow diagram we saw earlier, EDA and data wrangling shared
a box. This is because they are closely tied:

- Data needs to be prepped before EDA

- Visualizations that are created during EDA may indicate the need for additional
data cleaning.
- Data wrangling uses summary statistics to look for potential data issues, while
EDA uses them to understand the data. Improper cleaning will distort the findings
when we're conducting EDA. In addition, data wrangling skills will be required to
get summary statistics across subsets of the data.

When calculating summary statistics, we must keep the type of data we collected in mind.
Data can be quantitative (measurable quantities) or categorical (descriptions, groupings,
or categories). Within these classes of data, we have further subdivisions that let us know
what types of operations we can perform on them.

For example, categorical data can be nominal, where we assign a numeric value to each
level of the category, such as on = 1/off = 0. Note that the fact that on is greater than
off is meaningless because we arbitrarily chose those numbers to represent the states on
and off. When there is a ranking among the categories, they are ordinal, meaning that
we can order the levels (for instance, we can have low < medium < high).

Quantitative data can use an interval scale or a ratio scale. The interval scale includes
things such as temperature. We can measure temperatures in Celsius and compare the
temperatures of two cities, but it doesn't mean anything to say one city is twice as hot
as the other. Therefore, interval scale values can be meaningfully compared using
addition/subtraction, but not multiplication/division. The ratio scale, then, are those
values that can be meaningfully compared with ratios (using multiplication and division).
Examples of the ratio scale include prices, sizes, and counts.

### Drawing conclusions

- Did we notice any patterns or relationships when visualizing the data?
- Does it look like we can make accurate predictions from our data? Does it make
sense to move to modeling the data?
- Should we handle missing data points? How?
- How is the data distributed?
- Does the data help us answer the questions we have or give insight into the problem
we are investigating?
- Do we need to collect new or additional data?


## Statistical foundations

#### Population and Sample

## Sampling

There's an important thing to remember before we attempt any analysis: our sample must
be a random sample that is representative of the population. This means that the data
must be sampled without bias (for example, if we are asking people whether they like
a certain sports team, we can't only ask fans of the team) and that we should have (ideally)
members of all distinct groups from the population in our sample (in the sports team
example, we can't just ask men).

We will need to sample our data, which will be a sample to begin with. This
is called resampling. Depending on the data, we will have to pick a different method
of sampling. Often, our best bet is a simple random sample: we use a random number
generator to pick rows at random. When we have distinct groups in the data, we want
our sample to be a stratified random sample, which will preserve the proportion of the
groups in the data. In some cases, we don't have enough data for the aforementioned
sampling strategies, so we may turn to random sampling with replacement
(bootstrapping); this is called a bootstrap sample. Note that our underlying sample
needs to have been a random sample or we risk increasing the bias of the estimator
(we could pick certain rows more often because they are in the data more often if it was
a convenience sample, while in the true population these rows aren't as prevalent).

### Descriptive and Inferencial Statistics

- Measures of Central Tendency
- Measures of Dispersion/Spread

### Measures of Central Tendency

- It describes where most of the data is centered around
- Three common measures of central tendency
- Mean,Median and Mode

## Mean

- mu represents population mean
- x bar represents sample mean
- summing all the values/count of values

In [38]:
age = [20,25,23,27,28,23,25,24]

In [39]:
(20+25+23+27+28+23+25+24)/8

24.375

In [40]:
salary = [1000,2000,3000,5000,7000,8000]

In [41]:
import numpy as np

In [42]:
np.mean(salary)

4333.333333333333

In [43]:
salary = [1000,2000,3000,5000,7000,8000,70000]

In [44]:
np.mean(salary)

13714.285714285714

## Median
- it calculates the 50th percentile of our data, 50% of our data lies under which value
- 50% of our data is below that value and 50% is above that value

In [45]:
salary = [1000,2000,3000,5000,7000,8000]

In [46]:
np.median(salary)

4000.0

In [47]:
salary = [1000,2000,3000,5000,7000,8000,70000]

In [48]:
np.median(salary)

5000.0

## Odd Number

- sorting
- Picks the middle value

In [49]:
ages = [34,56,53,45,78,43,33]

In [50]:
[33,34,43,45,53,56,78]

[33, 34, 43, 45, 53, 56, 78]

In [51]:
np.median(ages)

45.0

## Even Number

- ordering
- picks the two middle vlaues and devides it by 2

In [52]:
ages = [34,56,53,45,78,43,33,51]

In [53]:
[33,34,43,45,51,53,56,78]

[33, 34, 43, 45, 51, 53, 56, 78]

In [54]:
(45+51)/2

48.0

In [55]:
np.median(ages)

48.0

- - -

In [1]:
import numpy as np

In [9]:
patient_ages = [32,45,29,52,34,56,41,48,39,60,28,55]

In [14]:
patient = [28,29,32,34,39,41,45,48,52,55,56,60]

In [32]:
np.mean(patient_ages)

43.25

In [17]:
np.median(patient)

43.0

In [18]:
returns = [0.12,0.09,-0.05,0.15,0.18,-0.02,0.10,0.07,0.14,-0.08]

In [19]:
np.mean(returns)

0.07

In [23]:
returns = [0.10,0.12,0.14,0.15,0.18,0.09,0.07,-0.02,-0.05,-0.08]

In [24]:
np.median(returns)

0.095

In [25]:
gene_expressions = [2.5,3.0,2.2,3.5,2.8,3.2,2.9,3.1,2.7,3.3]

In [26]:
np.mean(gene_expressions)

2.92

In [30]:
gene_expressions = [2.2,2.5,2.7,2.8,2.9,3.0,3.1,3.2,3.3,3.5]

In [31]:
np.median(gene_expressions)

2.95

### Practical Implementation

In [34]:
import pandas as pd

In [2]:
housing = pd.read_csv('C:\data analytics\pandas_docs/housing2 (1).csv')
housing

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity,gender
0,-122.23,37.88,41.0,880,129.0,322.0,126,8.3252,452600,NEAR BAY,male
1,-122.22,37.86,21.0,7099,1106.0,2401.0,1138,8.3014,358500,NEAR BAY,female
2,-122.24,37.85,52.0,1467,190.0,496.0,177,7.2574,352100,NEAR BAY,male
3,-122.25,37.85,52.0,1274,235.0,558.0,219,5.6431,341300,NEAR BAY,female
4,-122.25,37.85,,1627,280.0,,259,3.8462,342200,NEAR BAY,male
...,...,...,...,...,...,...,...,...,...,...,...
20635,-121.09,39.48,25.0,1665,374.0,845.0,330,1.5603,78100,INLAND,female
20636,-121.21,39.49,18.0,697,150.0,356.0,114,2.5568,77100,INLAND,male
20637,-121.22,39.43,17.0,2254,485.0,1007.0,433,1.7000,92300,INLAND,female
20638,-121.32,39.43,18.0,1860,409.0,741.0,349,1.8672,84700,INLAND,male


In [3]:
housing1 = housing.copy()

### Mean & Median

#### Longitude

In [4]:
housing['longitude'].mean()

-119.56970445736432

In [5]:
housing['longitude'].median()

-118.49

#### Latitude

In [9]:
housing['latitude'].mean()

35.63186143410853

In [10]:
housing['latitude'].median()

34.26

#### Median age of house

In [12]:
housing['housing_median_age'].mean()

28.676282994799333

In [13]:
housing['housing_median_age'].median()

29.0

#### Total Rooms

In [14]:
housing['total_rooms'].mean()

2635.7630813953488

In [15]:
housing['total_rooms'].median()

2127.0

#### Total Bedrooms

In [16]:
housing['total_bedrooms'].mean()

539.9201040741211

In [17]:
housing['total_bedrooms'].median()

435.0

#### Population

In [18]:
housing['population'].mean()

1424.9287240240824

In [19]:
housing['population'].median()

1166.0

#### Households

#### Median Income

In [4]:
housing['median_income'].max()

15.0001

In [5]:
housing['median_income'].min()

0.4999

In [28]:
housing['median_income'].mean()

3.9394028646561856

In [29]:
housing['median_income'].median()

3.5871

#### Median House Value

In [30]:
housing['median_house_value'].mean()

206855.81690891474

In [31]:
housing['median_house_value'].median()

179700.0

### Mode

In [7]:
housing.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity,gender
0,-122.23,37.88,41.0,880,129.0,322.0,126,8.3252,452600,NEAR BAY,male
1,-122.22,37.86,21.0,7099,1106.0,2401.0,1138,8.3014,358500,NEAR BAY,female
2,-122.24,37.85,52.0,1467,190.0,496.0,177,7.2574,352100,NEAR BAY,male
3,-122.25,37.85,52.0,1274,235.0,558.0,219,5.6431,341300,NEAR BAY,female
4,-122.25,37.85,,1627,280.0,,259,3.8462,342200,NEAR BAY,male


In [23]:
housing = housing.dropna()

In [25]:
housing.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10177 entries, 0 to 20639
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           10177 non-null  float64
 1   latitude            10177 non-null  float64
 2   housing_median_age  10177 non-null  float64
 3   total_rooms         10177 non-null  int64  
 4   total_bedrooms      10177 non-null  float64
 5   population          10177 non-null  float64
 6   households          10177 non-null  object 
 7   median_income       10177 non-null  float64
 8   median_house_value  10177 non-null  int64  
 9   ocean_proximity     10177 non-null  object 
 10  gender              10177 non-null  object 
dtypes: float64(6), int64(2), object(3)
memory usage: 954.1+ KB


In [29]:
housing['latitude'].mode()

0    34.05
Name: latitude, dtype: float64

In [30]:
import scipy as sp

In [31]:
sp.stats.mode(housing['latitude'])

ModeResult(mode=34.05, count=128)

In [32]:
housing['latitude'].value_counts()

latitude
34.05    128
37.76    121
37.79    114
37.80    110
34.04    108
        ... 
40.17      1
40.26      1
39.92      1
39.32      1
39.27      1
Name: count, Length: 759, dtype: int64

In [37]:
sp.stats.mode(housing['longitude'])

ModeResult(mode=-118.3, count=130)

In [38]:
sp.stats.mode(housing['housing_median_age'])

ModeResult(mode=52.0, count=827)

In [39]:
sp.stats.mode(housing['total_rooms'])

ModeResult(mode=1438, count=11)

In [40]:
sp.stats.mode(housing['total_bedrooms'])

ModeResult(mode=322.0, count=30)

In [41]:
sp.stats.mode(housing['population'])

ModeResult(mode=761.0, count=16)

In [43]:
#sp.stats.mode(housing['households'])   object

In [44]:
sp.stats.mode(housing['median_income'])

ModeResult(mode=2.875, count=29)

In [45]:
sp.stats.mode(housing['median_house_value'])

ModeResult(mode=500001, count=542)

In [47]:
housing['gender'].mode()

0    female
Name: gender, dtype: object

In [48]:
housing['ocean_proximity'].mode()

0    <1H OCEAN
Name: ocean_proximity, dtype: object

In [49]:
housing['households'].mode()

0    297
Name: households, dtype: object

In [2]:
netflix = pd.read_csv('C:\data analytics\pandas_docs/Netflix TV Shows and Movies.csv')
netflix

Unnamed: 0,index,id,title,type,description,release_year,age_certification,runtime,imdb_id,imdb_score,imdb_votes
0,0,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,113,tt0075314,8.3,795222.0
1,1,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,tt0071853,8.2,530877.0
2,2,tm70993,Life of Brian,MOVIE,"Brian Cohen is an average young Jewish man, bu...",1979,R,94,tt0079470,8.0,392419.0
3,3,tm190788,The Exorcist,MOVIE,12-year-old Regan MacNeil begins to adapt an e...,1973,R,133,tt0070047,8.1,391942.0
4,4,ts22164,Monty Python's Flying Circus,SHOW,A British sketch comedy series with the shows ...,1969,TV-14,30,tt0063929,8.8,72895.0
...,...,...,...,...,...,...,...,...,...,...,...
5278,5278,tm1040816,Momshies! Your Soul is Mine,MOVIE,Three women with totally different lives accid...,2021,,108,tt14412240,5.8,26.0
5279,5279,tm1014599,Fine Wine,MOVIE,A beautiful love story that can happen between...,2021,,100,tt13857480,6.9,39.0
5280,5280,tm1045018,Clash,MOVIE,A man from Nigeria returns to his family in Ca...,2021,,88,tt14620732,6.5,32.0
5281,5281,tm1098060,Shadow Parties,MOVIE,A family faces destruction in a long-running c...,2021,,116,tt10168094,6.2,9.0


In [5]:
netflix.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5283 entries, 0 to 5282
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   index              5283 non-null   int64  
 1   id                 5283 non-null   object 
 2   title              5283 non-null   object 
 3   type               5283 non-null   object 
 4   description        5278 non-null   object 
 5   release_year       5283 non-null   int64  
 6   age_certification  2998 non-null   object 
 7   runtime            5283 non-null   int64  
 8   imdb_id            5283 non-null   object 
 9   imdb_score         5283 non-null   float64
 10  imdb_votes         5267 non-null   float64
dtypes: float64(2), int64(3), object(6)
memory usage: 454.1+ KB


In [7]:
netflix.select_dtypes(['int64','float64'])

Unnamed: 0,index,release_year,runtime,imdb_score,imdb_votes
0,0,1976,113,8.3,795222.0
1,1,1975,91,8.2,530877.0
2,2,1979,94,8.0,392419.0
3,3,1973,133,8.1,391942.0
4,4,1969,30,8.8,72895.0
...,...,...,...,...,...
5278,5278,2021,108,5.8,26.0
5279,5279,2021,100,6.9,39.0
5280,5280,2021,88,6.5,32.0
5281,5281,2021,116,6.2,9.0


In [20]:
netflix['imdb_score'].mean()

6.5334469051675175

In [21]:
netflix['imdb_score'].median()

6.6

In [6]:
sp.stats.mode(netflix['imdb_score'])

ModeResult(mode=6.6, count=201)

In [11]:
netflix['imdb_score'].mode()

0    6.6
Name: imdb_score, dtype: float64

- Average rating of a movie or show is 6.53
- Median value is 6.6
- Mode value is 6.6
- It means the most counteded value in the dataset is 6.6
- In this data 6.6 rated movies are more

In [22]:
netflix['runtime'].mean()

79.19988642816581

In [23]:
netflix['runtime'].median()

87.0

In [18]:
netflix['runtime'].mode()

0    90
Name: runtime, dtype: int64

In [19]:
sp.stats.mode(netflix['runtime'])

ModeResult(mode=90, count=119)

- Average length of a movie is 79.19
- Median is 87
- Mode value is 90
- Most length of a show or movie is 90

In [28]:
netflix['type'].unique()

array(['MOVIE', 'SHOW'], dtype=object)

In [25]:
netflix['type'].mode()

0    MOVIE
Name: type, dtype: object

- In this data Movies are more than the shows

In [29]:
netflix['release_year'].mean()

2015.8799924285445

In [30]:
netflix['release_year'].median()

2018.0

In [31]:
netflix['release_year'].mode()

0    2019
Name: release_year, dtype: int64

In [32]:
sp.stats.mode(netflix['release_year'])

ModeResult(mode=2019, count=749)

- Average value in the release_year is 2015.88 
- Cant mention the year in float value
- Median value is 2015
- Mode value is 2019
- Most show or movie released by the year 2019

In [11]:
netflix['imdb_votes'].fillna(netflix['imdb_votes'].mode()[0],inplace=True)

In [12]:
sp.stats.mode(netflix['imdb_votes'])

ModeResult(mode=25.0, count=27)

In [13]:
netflix['imdb_votes'].mode()

0    25.0
Name: imdb_votes, dtype: float64

------------------

## Market Analysis

**Q1- In a retail store, the sales of a product have been recorded over a year. Determine the average monthly sales the month with the highest sales, and the median sales to understand the overall performance.**

In [13]:
dd  = {'Month':['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'],
        'Product':['pro1','pro2','pro3','pro4','pro5','pro6','pro7','pro8','pro9','pro10','pro11','pro12'],
        'Sale':[225,646,936,232,674,413,174,536,773,822,443,349]}

In [14]:
dd

{'Month': ['jan',
  'feb',
  'mar',
  'apr',
  'may',
  'jun',
  'jul',
  'aug',
  'sep',
  'oct',
  'nov',
  'dec'],
 'Product': ['pro1',
  'pro2',
  'pro3',
  'pro4',
  'pro5',
  'pro6',
  'pro7',
  'pro8',
  'pro9',
  'pro10',
  'pro11',
  'pro12'],
 'Sale': [225, 646, 936, 232, 674, 413, 174, 536, 773, 822, 443, 349]}

In [67]:
data  = pd.DataFrame(dd)
data

Unnamed: 0,Month,Product,Sale
0,jan,pro1,225
1,feb,pro2,646
2,mar,pro3,936
3,apr,pro4,232
4,may,pro5,674
5,jun,pro6,413
6,jul,pro7,174
7,aug,pro8,536
8,sep,pro9,773
9,oct,pro10,822


In [68]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Month    12 non-null     object
 1   Product  12 non-null     object
 2   Sale     12 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 420.0+ bytes


In [69]:
data['Sale'].mean()

518.5833333333334

- Average monthly sales is 518.58

In [70]:
data['Sale'].max()

936

In [71]:
data[data['Sale'] == 936]

Unnamed: 0,Month,Product,Sale
2,mar,pro3,936


- Month with highest sale is March

In [46]:
data['Sale'].median()

489.5

## Educational Assessment

**Q2 - Analyze the performance of students in a class based on their test scores. Compute the class average score , the most common score, and the median score to understand the overall performance and identify the common weaknesses.**

In [9]:
stu = {'Name':['a','b','c','d','e','f','g','h','i','j'],
       'Class':['IX','IX','IX','IX','IX','IX','IX','IX','IX','IX'],
      'Score(200)':[96,124,73,192,124,89,119,48,120,124]}

In [10]:
students = pd.DataFrame(stu)
students

Unnamed: 0,Name,Class,Score(200)
0,a,IX,96
1,b,IX,124
2,c,IX,73
3,d,IX,192
4,e,IX,124
5,f,IX,89
6,g,IX,119
7,h,IX,48
8,i,IX,120
9,j,IX,124


In [8]:
students['Score(200)'].mean()

105.3

- Average score of the class is 105.3

In [11]:
students['Score(200)'].mode()

0    124
Name: Score(200), dtype: int64

- 124 is the most common score 

In [51]:
students['Score(200)'].median()

107.5

- Median score is 107.5

## Public Opinion Analysis

**Q3- Analyze the survey results on public satisfaction with public transportation services. Compute the average satisfaction level , the most common concern among the respondents and the median satisfaction level to understand the overall sentiment and identify the key areas for improvement.**

In [16]:
sur = {'Respondent':[1,2,3,4,5,6,7],
       'Pub Transport Serv':['ser1','ser2','ser3','ser4','ser5','ser6','ser7'],
       'Pub Concern':['Bus tranp','Bus tranp','train tranp','Bus tranp','Bus tranp','train tranp','Bus tranp'],
       'Pub Satisf Level(100)':[50,57,70,56,69,76,46]}


In [17]:
survey = pd.DataFrame(sur)
survey

Unnamed: 0,Respondent,Pub Transport Serv,Pub Concern,Pub Satisf Level(100)
0,1,ser1,Bus tranp,50
1,2,ser2,Bus tranp,57
2,3,ser3,train tranp,70
3,4,ser4,Bus tranp,56
4,5,ser5,Bus tranp,69
5,6,ser6,train tranp,76
6,7,ser7,Bus tranp,46


In [18]:
survey['Pub Satisf Level(100)'].mean()

60.57142857142857

- Average satisfaction of public is 60.57

In [19]:
survey['Pub Satisf Level(100)'].median()

57.0

- Median satisfaction is 57

In [20]:
survey['Pub Concern'].mode()

0    Bus tranp
Name: Pub Concern, dtype: object

- Bus transport is the most common concern of the respondents
- Bus tranport areas are the improvement areas