To find the most demanded skills for the top 3 most popular data roles. I filtered out those positions by which ones were the most popular and
Got the top 5 skills for these top 3 roles. This query highlights the most popular job titles and their corresponding top skills, indicating which skills I should focus on depending on my target role.
View my notebook with details steps here:
2_Skill_Demand.ipynb
fig, ax = plt.subplots (len(job_titles), 1)
for i, job_title in enumerate (job_titles):
    df_plot = df_skills_count[df_skills_count['job_title_short'] == job_title].head(5)
    df_plot.plot(kind = 'barh', x = 'job_skills', y = 'skill_count', ax = ax [i], title = job_title, legend = False)
    ax[i].invert_yaxis()
    ax[i].set_ylabel('')
fig.suptitle('Counts of Job Skills in Job Postings', fontsize = 15, fontweight='semibold')
plt.tight_layout()
plt.show()- SQL is the most requested skill for Data Analysts and Data Scientists, with it in over half the job postings for both roles. For Data Engineers, Python is the most sought-after skill, appearing in 68% of job postings.
- Data Engineers require more specialized technical skills (AWS, Azure, Spark) compared to Data Analysts and Data Scientists who are expected to be proficient in more general data management and analysis tools (Excel, Tableau).
- Python is a versatile skill, highly demanded across all three roles, but most prominently for Data Scientists (72%) and Data Engineers (65%).
from matplotlib.ticker import PercentFormatter
df_plot = df_DA_US_percent.iloc[:,:5]
sns.lineplot(data = df_plot, dashes = False, legend = False)
sns.set_theme(style = 'ticks')
sns.despine()
plt.title('Trending Top Skills for Data Analysts in the US')
plt.ylabel('Likelihood in Job Posting')
plt.xlabel('2023')
plt.gca().yaxis.set_major_formatter(PercentFormatter(decimals=0))
for i in range(5):
    plt.text(11.3, df_plot.iloc[-1, i], df_plot.columns[i])
plt.show()
Bar graph visualizing the trending top skills for data analysts in the US in 2023
- SQL remains the most consistently demanded skill throughout the year, although it shows a gradual decrease in demand.
- Excel experienced a significant increase in demand starting around September, surpassing both Python and Tableau by the end of the year.
- Both Python and Tableau show relatively stable demand throughout the year with some fluctuations but remain essential skills for data analysts. Power BI, while less demanded compared to the others, shows a slight upward trend towards the year's end.
sns.boxplot(data=df_US_top6, x='salary_year_avg', y='job_title_short', order=job_order)
ticks_x = plt.FuncFormatter(lambda y, pos: f'${int(y/1000)}K')
plt.gca().xaxis.set_major_formatter(ticks_x)
plt.show() Box plot visualizing the salary distribution for the top 6 data job titles
Box plot visualizing the salary distribution for the top 6 data job titles
- 
There's a significant variation in salary ranges across different job titles. Senior Data Scientist positions tend to have the highest salary potential, with up to $600K, indicating the high value placed on advanced data skills and experience in the industry. 
- 
Senior Data Engineer and Senior Data Scientist roles show a considerable number of outliers on the higher end of the salary spectrum, suggesting that exceptional skills or circumstances can lead to high pay in these roles. In contrast, Data Analyst roles demonstrate more consistency in salary, with fewer outliers. 
- 
The median salaries increase with the seniority and specialization of the roles. Senior roles (Senior Data Scientist, Senior Data Engineer) not only have higher median salaries but also larger differences in typical salaries, reflecting greater variance in compensation as responsibilities increase. 
fig, ax = plt.subplots(2, 1)  
# Top 10 Highest Paid Skills for Data Analysts
sns.barplot(data=df_DA_top_pay, x='median', y=df_DA_top_pay.index, hue='median', ax=ax[0], palette='dark:b_r')
# Top 10 Most In-Demand Skills for Data Analysts')
sns.barplot(data=df_DA_skills, x='median', y=df_DA_skills.index, hue='median', ax=ax[1], palette='light:b')
plt.show() Two separate bar graphs visualizing the highest paid skills and most in-demand skills for data analysts in the US.
Two separate bar graphs visualizing the highest paid skills and most in-demand skills for data analysts in the US.
- 
The top graph shows specialized technical skills like dplyr, Bitbucket, and Gitlab are associated with higher salaries, some reaching up to $200K, suggesting that advanced technical proficiency can increase earning potential. 
- 
The bottom graph highlights that foundational skills like Excel, PowerPoint, and SQL are the most in-demand, even though they may not offer the highest salaries. This demonstrates the importance of these core skills for employability in data analysis roles. 
- 
There's a clear distinction between the skills that are highest paid and those that are most in-demand. Data analysts aiming to maximize their career potential should consider developing a diverse skill set that includes both high-paying specialized skills and widely demanded foundational skills. 
from adjustText import adjust_text
import matplotlib.pyplot as plt
plt.scatter(df_DA_skills_high_demand['skill_percent'], df_DA_skills_high_demand['median_salary'])
plt.show() A scatter plot visualizing the most optimal skills (high paying & high demand) for data analysts in the US.
A scatter plot visualizing the most optimal skills (high paying & high demand) for data analysts in the US.
- 
The scatter plot shows that most of the programming skills (colored blue) tend to cluster at higher salary levels compared to other categories, indicating that programming expertise might offer greater salary benefits within the data analytics field. 
- 
The database skills (colored orange), such as Oracle and SQL Server, are associated with some of the highest salaries among data analyst tools. This indicates a significant demand and valuation for data management and manipulation expertise in the industry. 
- 
Analyst tools (colored green), including Tableau and Power BI, are prevalent in job postings and offer competitive salaries, showing that visualization and data analysis software are crucial for current data roles. This category not only has good salaries but is also versatile across different types of data tasks. 
