# Offerzen Tech Stacks vs Company Size

> The purpose of this notebook is to be able to show the company size for any tech stack being offered in the offerzen public dataset.

## Installing Packages

First we need to install `pandas` on our docker instance.

In [None]:
import sys
!{sys.executable} -m pip install pandas matplotlib seaborn

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

%matplotlib inline

In [None]:
df = pd.read_json('offerzen_company_size.json')
df

## List of Tech Stacks

This is a snippet that can be used to determine which tech stacks are available to use

In [None]:
df.tech.unique()

## Running the Visualization

After determining the tech stack that you would like to see, you run the below to visualize the output

> Try using different tech_stack values and see what gets generated.

In [None]:
technology = 'Java'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

result


In [None]:
sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name')
# use better labels to communicate the result
plot.set_title(f"Number of companies vs company sizes that offer the Tech Stack '{technology}'")
plot.set_xlabel("Company Size")
plot.set_ylabel("Number of Companies")

### Python

In [None]:
technology = 'Python'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

result


In [None]:
sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name')

# use better labels to communicate the result
plot.set_title(f"Number of companies vs company sizes that offer the Tech Stack '{technology}'")
plot.set_xlabel("Company Size")
plot.set_ylabel("Number of Companies")

Companies size (i.e.how many employees) under 50 use Python as their Technology stack. 

### Microsoft Azure

In [None]:
technology = 'Microsoft Azure'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

result

In [None]:
sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name')

# use better labels to communicate the result
plot.set_title(f"Number of companies vs company sizes that offer the Tech Stack '{technology}'")
plot.set_xlabel("Company Size")
plot.set_ylabel("Number of Companies")

### R

In [None]:
technology = 'R'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

result

In [None]:
sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name')

# use better labels to communicate the result
plot.set_title(f"Number of companies vs company sizes that offer the Tech Stack '{technology}'")
plot.set_xlabel("Company Size")
plot.set_ylabel("Number of Companies")

### Google App Engine

In [None]:
technology = 'Google App Engine'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

result

In [None]:
sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name')

# use better labels to communicate the result
plot.set_title(f"Number of companies vs company sizes that offer the Tech Stack '{technology}'")
plot.set_xlabel("Company Size")
plot.set_ylabel("Number of Companies")

## Improvements

Let's start to make improvements. We will go through them one by one. We will also use one block for each improvement because it is better to view the changes in the visualizes instead of the DataFrame objects

### Ordering the X-Axis
Here we reorder the data so that it makes sense on the X-Axis.

In [None]:
import re

technology = 'Java'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

# This 
result['sorted'] = result['company_size'].str.extract('(\d{0,})', expand = True).astype(int)
result = result.sort_values('sorted')
result

sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name')

### Changing the Color Scheme

Here we change the color scheme because we want to show that high and low values are not really important to us.

In [None]:
import re

technology = 'Java'

df_tech = df[df.tech == technology]
result = df_tech[['company_size','name']].groupby('company_size').count().reset_index()

# This 
result['sorted'] = result['company_size'].str.extract('(\d{0,})', expand = True).astype(int)
result = result.sort_values('sorted')
result

sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name',
            palette = sns.crayon_palette(["Navy Blue"]))

### Using Better Labels

Here we want to use better labels to communicate the intent 

In [None]:
sns.set(style="whitegrid")
plot = sns.barplot(data = result, 
            x = 'company_size', 
            y = 'name',
           palette = sns.crayon_palette(["Navy Blue"]))

plot.set_title(f"Number of companies vs company sizes that offer the Tech Stack '{technology}'")
plot.set_xlabel("Company Size")
plot.set_ylabel("Number of Companies")