# Overview
Throughout this assignment, you will be performing certain well-defined tasks that’ll not only strengthen your concepts of Plotly and Dash, but will also help you learn a number of new concepts that are useful in analyzing, summarizing and visualizing data in the real world. 

Here is a template notebook with all the tasks mentioned in detail. **Please complete the tasks within the designated section only.**


## Task 1: Data Loading and Data Aggregation
* Load the 3 data files into the variables data_18, data_19, data_20. 

* Data aggregation is the process of gathering data and presenting it in a summarized format. The data may be gathered from multiple data sources with the intent of combining these data sources into a summary for data analysis.         
Similar to how this dataset involves 3 data files, you’ll often be working on combining information from 2 or more files and analysing it. More often than not, GroupBy is a very useful tool for this purpose. 

  Go through this article to learn more some helpful aggregation tools in Python: https://www.bmc.com/blogs/pandas-group-merge-concatenate-join/ 

  **You don't need to aggregate/ merge the datasets in this assignment, it is only for reading purposes.**

In [140]:
import pandas as pd 
data_18=pd.read_csv("/Users/lokeshmamidisetti/Desktop/Survey_2018.csv")
data_19=pd.read_csv("/Users/lokeshmamidisetti/Desktop/Survey_2019.csv")
data_20=pd.read_csv("/Users/lokeshmamidisetti/Desktop/Survey_2020.csv")


## Task 2: Data Analysis
* Display the first 5 rows of the 2018 survey data
* Display a concise summary of the 2020 data and list out 3 observations/inferences that you observe from the result. For this you will need to use the info() method.
* Display the descriptive statistics of the 2018 survey data
* Display the number of missing values in each column of the 2018 survey data
How many people responded to the survey in each of the 3 years? Has the number increased or decreased over the years?
* Display all the unique values and their frequency in the column - “Number of vacation days” of 2020 data. Write down your observations (at least one) for this result. 


In [99]:
data_18.head(5)

Unnamed: 0,Timestamp,Age,Gender,City,Position,Years of experience,Your level,Current Salary,Salary one year ago,Salary two years ago,Are you getting any Stock Options?,Main language at work,Company size,Company type
0,14/12/2018 12:41:33,43.0,M,München,QA Ingenieur,11.0,Senior,77000.0,76200.0,68000.0,No,Deutsch,100-1000,Product
1,14/12/2018 12:42:09,33.0,F,München,Senior PHP Magento developer,8.0,Senior,65000.0,55000.0,55000.0,No,Deutsch,50-100,Product
2,14/12/2018 12:47:36,32.0,M,München,Software Engineer,10.0,Senior,88000.0,73000.0,54000.0,No,Deutsch,1000+,Product
3,14/12/2018 12:50:15,25.0,M,München,Senior Frontend Developer,6.0,Senior,78000.0,55000.0,45000.0,Yes,English,1000+,Product
4,14/12/2018 12:50:31,39.0,M,München,UX Designer,10.0,Senior,69000.0,60000.0,52000.0,No,English,100-1000,Ecom retailer


In [100]:
data_20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1253 entries, 0 to 1252
Data columns (total 23 columns):
 #   Column                                                                                                                   Non-Null Count  Dtype  
---  ------                                                                                                                   --------------  -----  
 0   Timestamp                                                                                                                1253 non-null   object 
 1   Age                                                                                                                      1226 non-null   float64
 2   Gender                                                                                                                   1243 non-null   object 
 3   City                                                                                                                     1253 non-null   o

#1.there are null values in every column except timestamp column
#2.there are a total of 23 columns in the dataset
#3.there are data types of float64(4), object(19)

In [101]:
data_18.describe()

Unnamed: 0,Age,Years of experience,Current Salary,Salary one year ago,Salary two years ago
count,672.0,732.0,750.0,596.0,463.0
mean,32.183036,8.548497,68381.765333,62187.278523,58013.475162
std,5.107268,4.729557,21196.306557,20163.008663,20413.048908
min,21.0,0.0,10300.0,10001.0,10001.0
25%,29.0,5.0,57000.0,52000.0,48000.0
50%,32.0,8.0,65000.0,60000.0,56000.0
75%,35.0,11.0,75000.0,70000.0,67000.0
max,60.0,38.0,200000.0,200000.0,150000.0


In [102]:
#null values in the 2018 data set
data_18.isnull().sum()

Timestamp                               0
Age                                    93
Gender                                 14
City                                   29
Position                               28
Years of experience                    33
Your level                             22
Current Salary                         15
Salary one year ago                   169
Salary two years ago                  302
Are you getting any Stock Options?     23
Main language at work                  15
Company size                           15
Company type                           35
dtype: int64

In [103]:
response_18=data_18['Timestamp'].count()
response_19=data_19['Zeitstempel'].count()
response_20=data_20['Timestamp'].count()
print(response_18)
print(response_19)
print(response_20)

765
991
1253


we can see that number of responses had increased over the years


In [104]:
data_20['Number of vacation days'].unique()

array(['30', '28', '24', '29', '27', nan, '25', '31', '26', '60', '20',
       '22', '38', '35', '32', '40', '365', '36', '23', '33', '21',
       'unlimited', '14', 'unlimited ', '(no idea)',
       '30 in contract (but theoretically unlimited)', '0', 'Unlimited ',
       '15', '16', '3', '45', '~25', '12', '50', '23+', '99', 'Unlimited',
       '24 labour days', '37.5', '1', '5', '37', '39', '34', '10'],
      dtype=object)

#mispelled unlimited found many times in the data.

## Task 3: Data Cleaning
* Rename the column ‘Position ‘ in the 2020 data to ‘Position’. (without the blank space)
* Check for missing values in 2020 data for all the columns. If there are no missing values, proceed to the next step. If there are missing values in the dataset,
  * For categorical variables, fill the missing values with the mode of the data. Remember if the data type of any variable is ‘object’, it is categorical variable. 
  * For numerical variables, fill the missing values with the mean of the data.

Here's a good blog that displays multiple methods of filling (imputing) missing values: https://jamesrledoux.com/code/imputation 
* Drop the timestamp column for all the three years data since the date and time at which a person filled the survey is irrelevant to us. The year matters and we already know that from the dataset’s name.
* Perform any other data cleaning steps you believe are necessary. (removing outliers, handling missing values in a way to beautify visualizations, making the categories uniform i.e python and Python should mean the same thing etc.) Note that the same steps will have to be performed for all 3 data files.

In [105]:
data_20.rename(columns={'Position ':'Position'},inplace=True,errors='raise')


In [106]:
data_20.columns

Index(['Timestamp', 'Age', 'Gender', 'City', 'Position',
       'Total years of experience', 'Years of experience in Germany',
       'Seniority level', 'Your main technology / programming language',
       'Other technologies/programming languages you use often',
       'Yearly brutto salary (without bonus and stocks) in EUR',
       'Yearly bonus + stocks in EUR',
       'Annual brutto salary (without bonus and stocks) one year ago. Only answer if staying in the same country',
       'Annual bonus+stocks one year ago. Only answer if staying in same country',
       'Number of vacation days', 'Employment status', 'Сontract duration',
       'Main language at work', 'Company size', 'Company type',
       'Have you lost your job due to the coronavirus outbreak?',
       'Have you been forced to have a shorter working week (Kurzarbeit)? If yes, how many hours per week',
       'Have you received additional monetary support from your employer due to Work From Home? If yes, how much in 202

In [107]:
data_20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1253 entries, 0 to 1252
Data columns (total 23 columns):
 #   Column                                                                                                                   Non-Null Count  Dtype  
---  ------                                                                                                                   --------------  -----  
 0   Timestamp                                                                                                                1253 non-null   object 
 1   Age                                                                                                                      1226 non-null   float64
 2   Gender                                                                                                                   1243 non-null   object 
 3   City                                                                                                                     1253 non-null   o

In [111]:
data_20.isnull().sum()

Timestamp                                                                                                                    0
Age                                                                                                                         27
Gender                                                                                                                      10
City                                                                                                                         0
Position                                                                                                                     6
Total years of experience                                                                                                   16
Years of experience in Germany                                                                                              32
Seniority level                                                                                                

In [150]:
data_20[data_20['Age'].isnull()]

Unnamed: 0,Timestamp,Age,Gender,City,Position,Total years of experience,Years of experience in Germany,Seniority level,Your main technology / programming language,Other technologies/programming languages you use often,...,Annual bonus+stocks one year ago. Only answer if staying in same country,Number of vacation days,Employment status,Сontract duration,Main language at work,Company size,Company type,Have you lost your job due to the coronavirus outbreak?,"Have you been forced to have a shorter working week (Kurzarbeit)? If yes, how many hours per week","Have you received additional monetary support from your employer due to Work From Home? If yes, how much in 2020 in EUR"
11,24/11/2020 11:18:16,,Male,Berlin,Software Engineer,25.0,11.0,Senior,C++,"Python, C/C++, SQL",...,13000.0,24,Self-employed (freelancer),Temporary contract,English,11-50,Product,Yes,,
12,24/11/2020 11:18:22,,,Berlin,Software Engineer,,,Lead,PHP,,...,,,Full-time employee,Unlimited contract,English,1000+,,No,,1000
28,24/11/2020 11:25:35,,Male,Berlin,DevOps,14.0,5.0,Senior,,"Python, Go, AWS, Kubernetes, Docker",...,,30,Full-time employee,Unlimited contract,English,101-1000,Product,No,,
55,24/11/2020 11:33:08,,Male,Berlin,Software Engineer,,1.0,Senior,PHP,,...,,,Full-time employee,,English,,,No,0.0,
113,24/11/2020 11:56:25,,Male,Berlin,QA Engineer,6.0,6.0,Middle,Javascript,"Javascript / Typescript, AWS",...,,30,Full-time employee,Unlimited contract,English,101-1000,Product,No,0.0,
300,24/11/2020 15:47:19,,Male,Berlin,Data Engineer,7.5,1.5,Middle,SQL,"Python, SQL, AWS, Kubernetes, Docker",...,0.0,30,Full-time employee,Unlimited contract,English,101-1000,Product,No,,
330,24/11/2020 17:29:51,,,Munich,Data Scientist,2.0,2.0,Middle,Python,,...,,20,Full-time employee,Unlimited contract,English,51-100,Product,No,,
340,24/11/2020 17:40:38,,Male,Hamburg,Support Engineer,,2.0,Senior,,"Kubernetes, Docker",...,,28,Full-time employee,Unlimited contract,English,,,No,,100
365,24/11/2020 18:31:25,,Male,Berlin,Software Engineer,9.0,1.0,Lead,Java,"AWS, Docker",...,,24,Full-time employee,Unlimited contract,English,1000+,Startup,No,20.0,No
374,24/11/2020 18:48:30,,Male,Karlsruhe,Backend Developer,8.0,8.0,Lead,Python,"Python, C/C++, Javascript / Typescript, Java /...",...,5400.0,29,Part-time employee,Unlimited contract,English,up to 10,Product,No,39.0,10000


In [156]:
cateogry_columns=data_20.select_dtypes(include=['object']).columns.tolist()
integer_columns=data_20.select_dtypes(include=['int64','float64']).columns.tolist()
for column in data_20:
    if data_20[column].isnull().any():
        if(column in data_20):
            data_20[column]=data_20[column].fillna(data_20[column].mode()[0])
        else:
            data_20[column]=data_20[column].fillna(data_20[column].mean)

In [157]:
data_20.isnull().sum()

Timestamp                                                                                                                  0
Age                                                                                                                        0
Gender                                                                                                                     0
City                                                                                                                       0
Position                                                                                                                   0
Total years of experience                                                                                                  0
Years of experience in Germany                                                                                             0
Seniority level                                                                                                            0


In [158]:
data_20.drop(['Timestamp'],axis=1)

Unnamed: 0,Age,Gender,City,Position,Total years of experience,Years of experience in Germany,Seniority level,Your main technology / programming language,Other technologies/programming languages you use often,Yearly brutto salary (without bonus and stocks) in EUR,...,Annual bonus+stocks one year ago. Only answer if staying in same country,Number of vacation days,Employment status,Сontract duration,Main language at work,Company size,Company type,Have you lost your job due to the coronavirus outbreak?,"Have you been forced to have a shorter working week (Kurzarbeit)? If yes, how many hours per week","Have you received additional monetary support from your employer due to Work From Home? If yes, how much in 2020 in EUR"
0,26.0,Male,Munich,Software Engineer,5,3,Senior,TypeScript,"Kotlin, Javascript / Typescript",80000.0,...,10000,30,Full-time employee,Unlimited contract,English,51-100,Product,No,0.0,0
1,26.0,Male,Berlin,Backend Developer,7,4,Senior,Ruby,Javascript / Typescript,80000.0,...,5000,28,Full-time employee,Unlimited contract,English,101-1000,Product,No,0.0,0
2,29.0,Male,Berlin,Software Engineer,12,6,Lead,Javascript / Typescript,"Javascript / Typescript, Docker",120000.0,...,100000,30,Self-employed (freelancer),Temporary contract,English,101-1000,Product,Yes,0.0,0
3,28.0,Male,Berlin,Frontend Developer,4,1,Junior,Javascript,Javascript / Typescript,54000.0,...,0,24,Full-time employee,Unlimited contract,English,51-100,Startup,No,0.0,0
4,37.0,Male,Berlin,Backend Developer,17,6,Senior,C# .NET,".NET, SQL, AWS, Docker",62000.0,...,0,29,Full-time employee,Unlimited contract,English,101-1000,Product,No,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1248,31.0,Male,Berlin,Backend Developer,9,5,Senior,Java,"Python, Javascript / Typescript, Java / Scala,...",70000.0,...,72000,26,Full-time employee,Unlimited contract,English,51-100,Product,Yes,0.0,0
1249,33.0,Male,Berlin,Researcher/ Consumer Insights Analyst,10,1.5,Senior,consumer analysis,Javascript / Typescript,60000.0,...,2500,unlimited,Full-time employee,Unlimited contract,English,1000+,Product,No,0.0,0
1250,39.0,Male,Munich,IT Operations Manager,15,2,Lead,PHP,"Python, C/C++, Javascript / Typescript, Java /...",110000.0,...,0,28,Full-time employee,Unlimited contract,English,101-1000,eCommerce,No,0.0,0
1251,26.0,Male,Saarbrücken,Frontend Developer,7,7,Middle,JavaScript,"Javascript / Typescript, Docker, HTML, CSS; Ad...",38350.0,...,36400,27,Full-time employee,Unlimited contract,German,101-1000,Product,No,0.0,0


In [173]:
data_19

Unnamed: 0,Zeitstempel,Age,Gender,City,Seniority level,Position (without seniority),Years of experience,Your main technology / programming language,Yearly brutto salary (without bonus and stocks),Yearly bonus,...,Yearly stocks one year ago. Only answer if staying in same country,Number of vacation days,Number of home office days per month,Main language at work,Company name,Company size,Company type,Сontract duration,Company business sector,0
0,02.12.2019 11:18:26,33.0,Male,Berlin,Senior,Fullstack Developer,13,PHP,64000.0,1000.0,...,0.0,29.0,4.0,English,Zalando,50-100,Startup,unlimited,Tourism,
1,02.12.2019 11:18:35,29.0,Male,Berlin,Middle,Backend Developer,3,Python,55000.0,0.0,...,0.0,22.0,4.0,English,Zalando,10-50,Product,unlimited,Scientific Activities,
2,02.12.2019 11:18:56,30.0,Male,Berlin,Middle,Mobile Developer,4,Kotlin,70000.0,0.0,...,0.0,27.0,4.0,English,Zalando,1000+,Startup,unlimited,Сommerce,
3,02.12.2019 11:19:08,30.0,Male,Berlin,Senior,Backend Developer,6,PHP,63000.0,0.0,...,0.0,24.0,4.0,English,Auto1,100-1000,Product,unlimited,Transport,
4,02.12.2019 11:19:37,32.0,Male,Berlin,Senior,Embedded Developer,10,C/C++,66000.0,0.0,...,0.0,30.0,0.0,English,Luxoft,50-100,Product,unlimited,Automotive,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
986,07.01.2020 09:23:01,30.0,Male,Amsterdam,Senior,Backend Developer,10,Python,71000.0,3000.0,...,0.0,25.0,5.0,English,Zalando,1000+,Product,unlimited,Telecom,
987,07.01.2020 10:08:18,28.0,Male,Amsterdam,Senior,Security Engineer,7,Not Relevant,72000.0,0.0,...,0.0,27.0,5.0,English,ING,1000+,Bank,unlimited,Finance / Insurance,
988,07.01.2020 16:52:43,42.0,Male,Munich,Senior,Manager,9,Not Relevant,68000.0,10000.0,...,0.0,30.0,5.0,English,SAP,1000+,Product,unlimited,Сommerce,
989,08.01.2020 11:18:41,33.0,Male,Berlin,Senior,Software Architect,15,Javascript / Typescript,100000.0,3000.0,...,0.0,26.0,6.0,English,Zalando,1000+,Product,more than 1 year,Health,


In [175]:
data_19.isnull().sum()

Zeitstempel                                                                                               0
Age                                                                                                       0
Gender                                                                                                    0
City                                                                                                      0
Seniority level                                                                                           0
Position (without seniority)                                                                              0
Years of experience                                                                                       0
Your main technology / programming language                                                               0
Yearly brutto salary (without bonus and stocks)                                                           0
Yearly bonus                

In [None]:
data_19

In [180]:
data_19.drop(['Zeitstempel'],axis=1)

Unnamed: 0,Age,Gender,City,Seniority level,Position (without seniority),Years of experience,Your main technology / programming language,Yearly brutto salary (without bonus and stocks),Yearly bonus,Yearly stocks,...,Yearly stocks one year ago. Only answer if staying in same country,Number of vacation days,Number of home office days per month,Main language at work,Company name,Company size,Company type,Сontract duration,Company business sector,0
0,33.0,Male,Berlin,Senior,Fullstack Developer,13,PHP,64000.0,1000.0,1.0,...,0.0,29.0,4.0,English,Zalando,50-100,Startup,unlimited,Tourism,
1,29.0,Male,Berlin,Middle,Backend Developer,3,Python,55000.0,0.0,1.0,...,0.0,22.0,4.0,English,Zalando,10-50,Product,unlimited,Scientific Activities,
2,30.0,Male,Berlin,Middle,Mobile Developer,4,Kotlin,70000.0,0.0,1.0,...,0.0,27.0,4.0,English,Zalando,1000+,Startup,unlimited,Сommerce,
3,30.0,Male,Berlin,Senior,Backend Developer,6,PHP,63000.0,0.0,1.0,...,0.0,24.0,4.0,English,Auto1,100-1000,Product,unlimited,Transport,
4,32.0,Male,Berlin,Senior,Embedded Developer,10,C/C++,66000.0,0.0,1.0,...,0.0,30.0,0.0,English,Luxoft,50-100,Product,unlimited,Automotive,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
986,30.0,Male,Amsterdam,Senior,Backend Developer,10,Python,71000.0,3000.0,0.0,...,0.0,25.0,5.0,English,Zalando,1000+,Product,unlimited,Telecom,
987,28.0,Male,Amsterdam,Senior,Security Engineer,7,Not Relevant,72000.0,0.0,0.0,...,0.0,27.0,5.0,English,ING,1000+,Bank,unlimited,Finance / Insurance,
988,42.0,Male,Munich,Senior,Manager,9,Not Relevant,68000.0,10000.0,1.0,...,0.0,30.0,5.0,English,SAP,1000+,Product,unlimited,Сommerce,
989,33.0,Male,Berlin,Senior,Software Architect,15,Javascript / Typescript,100000.0,3000.0,1.0,...,0.0,26.0,6.0,English,Zalando,1000+,Product,more than 1 year,Health,


In [181]:
data_19.drop(['0'],axis=1)

Unnamed: 0,Zeitstempel,Age,Gender,City,Seniority level,Position (without seniority),Years of experience,Your main technology / programming language,Yearly brutto salary (without bonus and stocks),Yearly bonus,...,Yearly bonus one year ago. Only answer if staying in same country,Yearly stocks one year ago. Only answer if staying in same country,Number of vacation days,Number of home office days per month,Main language at work,Company name,Company size,Company type,Сontract duration,Company business sector
0,02.12.2019 11:18:26,33.0,Male,Berlin,Senior,Fullstack Developer,13,PHP,64000.0,1000.0,...,1000.0,0.0,29.0,4.0,English,Zalando,50-100,Startup,unlimited,Tourism
1,02.12.2019 11:18:35,29.0,Male,Berlin,Middle,Backend Developer,3,Python,55000.0,0.0,...,5000.0,0.0,22.0,4.0,English,Zalando,10-50,Product,unlimited,Scientific Activities
2,02.12.2019 11:18:56,30.0,Male,Berlin,Middle,Mobile Developer,4,Kotlin,70000.0,0.0,...,5000.0,0.0,27.0,4.0,English,Zalando,1000+,Startup,unlimited,Сommerce
3,02.12.2019 11:19:08,30.0,Male,Berlin,Senior,Backend Developer,6,PHP,63000.0,0.0,...,5000.0,0.0,24.0,4.0,English,Auto1,100-1000,Product,unlimited,Transport
4,02.12.2019 11:19:37,32.0,Male,Berlin,Senior,Embedded Developer,10,C/C++,66000.0,0.0,...,5000.0,0.0,30.0,0.0,English,Luxoft,50-100,Product,unlimited,Automotive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
986,07.01.2020 09:23:01,30.0,Male,Amsterdam,Senior,Backend Developer,10,Python,71000.0,3000.0,...,3000.0,0.0,25.0,5.0,English,Zalando,1000+,Product,unlimited,Telecom
987,07.01.2020 10:08:18,28.0,Male,Amsterdam,Senior,Security Engineer,7,Not Relevant,72000.0,0.0,...,0.0,0.0,27.0,5.0,English,ING,1000+,Bank,unlimited,Finance / Insurance
988,07.01.2020 16:52:43,42.0,Male,Munich,Senior,Manager,9,Not Relevant,68000.0,10000.0,...,9000.0,0.0,30.0,5.0,English,SAP,1000+,Product,unlimited,Сommerce
989,08.01.2020 11:18:41,33.0,Male,Berlin,Senior,Software Architect,15,Javascript / Typescript,100000.0,3000.0,...,5000.0,0.0,26.0,6.0,English,Zalando,1000+,Product,more than 1 year,Health


In [183]:
data_18.isnull().sum()

Timestamp                               0
Age                                    93
Gender                                 14
City                                   29
Position                               28
Years of experience                    33
Your level                             22
Current Salary                         15
Salary one year ago                   169
Salary two years ago                  302
Are you getting any Stock Options?     23
Main language at work                  15
Company size                           15
Company type                           35
dtype: int64

In [184]:
for column in data_18:
    if data_18[column].isnull().any():
        if(column in data_18):
            data_18[column]=data_18[column].fillna(data_18[column].mode()[0])
        else:
            data_18[column]=data_18[column].fillna(data_18[column].mean)

In [185]:
data_18.isnull().sum()

Timestamp                             0
Age                                   0
Gender                                0
City                                  0
Position                              0
Years of experience                   0
Your level                            0
Current Salary                        0
Salary one year ago                   0
Salary two years ago                  0
Are you getting any Stock Options?    0
Main language at work                 0
Company size                          0
Company type                          0
dtype: int64

## Task 4: Data Visualization using Plotly
**Note:** All the tasks below need to be completed using only Plotly and no other Data Visualization library.

* Create a pie chart to analyze the Company types in the year 2019. Are Consulting / Agency companies more popular than Startups? 
* Create a line plot of the Total years of experience vs the current salary(taking the median salary for each of the different experience years) of the year 2018.
* Now, create the above plot again and add 2 more line plots to the same graph, that display the Total years of experience vs the median Yearly brutto salary (without bonus and stocks) of the year 2019 and 2020.
* Create a bar chart to analyse the popularity of the main technology/ programming languages amongst the respondents in the year 2020. Which technology is the most popular? Which technology is the least popular (with less than 4 responses)?
* Create a pie plot indicating the gender ratio of the respondents in the year 2020.


In [221]:
data_19.isnull().sum()

Zeitstempel                                                                                               0
Age                                                                                                       0
Gender                                                                                                    0
City                                                                                                      0
Seniority level                                                                                           0
Position (without seniority)                                                                              0
Years of experience                                                                                       0
Your main technology / programming language                                                               0
Yearly brutto salary (without bonus and stocks)                                                           0
Yearly bonus                

In [279]:
df=data_19['Company type'].value_counts()
df

Product                 650
Startup                 181
Consulting / Agency     117
Bodyshop / Outsource     30
University                6
Bank                      6
Outsource                 1
Name: Company type, dtype: int64

In [289]:
import plotly.express as px
fig=px.pie(df,values=df, 
           names='Company type' ,
            title='Company name')
fig.show()

In [300]:
fig1=px.line(data_18,x='Years of experience',y='Current Salary')
fig1.show()

In [301]:
df=data_20['Company type'].value_counts()
df

Product                785
Startup                252
Consulting / Agency    142
Bank                     5
Media                    3
                      ... 
Handel                   1
Cloud                    1
Big commercial           1
E-Commerce               1
SaaS                     1
Name: Company type, Length: 63, dtype: int64

In [320]:
df2=df[0:]
df2

Product                785
Startup                252
Consulting / Agency    142
Bank                     5
Media                    3
                      ... 
Handel                   1
Cloud                    1
Big commercial           1
E-Commerce               1
SaaS                     1
Name: Company type, Length: 63, dtype: int64

In [345]:
import plotly.express as px
fig=px.pie(df2,values=data_20['Company type'].value_counts(), 
           names='Company type' ,
            title='Company name')
fig.show()

In [325]:
fig1=px.line(data_20,x='Total years of experience',y='Yearly brutto salary (without bonus and stocks) in EUR')
fig1.show()

In [327]:
data_20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1253 entries, 0 to 1252
Data columns (total 23 columns):
 #   Column                                                                                                                   Non-Null Count  Dtype  
---  ------                                                                                                                   --------------  -----  
 0   Timestamp                                                                                                                1253 non-null   object 
 1   Age                                                                                                                      1253 non-null   float64
 2   Gender                                                                                                                   1253 non-null   object 
 3   City                                                                                                                     1253 non-null   o

In [355]:
df4['language']=data_20['Your main technology / programming language'].value_counts()
df5=pd.DataFrame(df4)
df5

Unnamed: 0,Your main technology / programming language
Java,311
Python,164
PHP,56
C++,38
JavaScript,34
...,...
Terraform,1
Qml,1
Hardware,1
swift,1


In [365]:
fig = px.bar(data_20, x='Your main technology / programming language', y='Gender')
fig.show()

In [376]:
df6=data_20['Gender'].value_counts()
df6

Male       1059
Female      192
Diverse       2
Name: Gender, dtype: int64

In [375]:
fig=px.pie(df6,values=df6, names='Gender', title='Gender')
fig.show()

## Bonus Section [Optional but carries bonus marks]
This dataset is as raw and real as it can get while conducting yearly surveys. You might have observed that the data is not clean and structured and requires some thorough cleaning before deriving meaningful plots. When combined with the power of Plotly and Dash, there are endless possibilities for the insightful visualizations you can create. 

This section is to let you experiment, explore and create as many visualizations as you’d like. You never know, if we like the creativity and the extra work, you might receive some bonus marks!


# Conclusion
This brings us to the end of the assignment and to the bootcamp. We hope you had a great learning time. :)

Now, you can submit your notebook for assessment. 