Skip to content

My personal roadmap to data science based on job offers requirements

Notifications You must be signed in to change notification settings

t0mm4rx/DataScientistRoadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Roadmap to become a Data Scientist

What's this list ?

It's hard to have a clear view on what to learn and what to know to be employable. Especially when you're not in a traditional cursus.

This list is a compilation of most-wanted skills for data scientist based on online job offers.

I took hundreds of data scientist job offers in Paris, France, in Novembre 2020. This list may not be representative of the most-wanted skills in other areas or countries.

The raw data extracted from job offers is visible in JobOffers.md.

The lists are ordered by frequence of mentionning in the offers.

Skills

Maths / Theory

Methodology

  • Understand and implement scientistic papers.
  • Statistical methodology. Statistics testing, P-value.

Skills

  • General statistics knowledge. Distribution, Bayesian inference, statistics models, probabilities.
  • Time series analysis.
  • Sequential analysis.
  • Scoring.
  • Regression.
  • Econometrics.
  • Game theory.

Algorithmic

  • Complexity estimation.
  • Graph theory.
  • Approximation algorithm.
  • K-nearest neigbours.

Machine Learning

  • Deep learning. Neural networks theory.
  • Decision tree / Gradient boosted decision tree.
  • Regression / Logistic regression.
  • Reinforcement learning.
  • Convolutional Neural Network.
  • Neural language processing.
  • Ensemble modeling.
  • Recommendation.
  • Clustering.
  • Auto-encoder.
  • Restricted Boltzmann machine.

Data visualisation

  • Qlik.
  • Google Data Studio.
  • Plotly / Dash. For Python/R.
  • Shiny. For R.
  • Chartio.
  • Matplotlib / Seaborn. For Python.
  • Bokeh. For Python, R wrapper.
  • Graphiz. For Python/R.
  • Kibana.
  • PowerBI.
  • Sweetviz. For Python.

Analytics / All-in-one solutions

  • Dataiku.
  • Druid.
  • H2O.ai.

Production

Python was 2x more mentionned than R, but both are really demanded.

SQL is as demanded as R, it appears to be an essential skill.

Dashboarding in general is a top-demanded skill.

Languages

  • Python.
  • R.
  • C++.

Libs

  • Pandas / Numpy. Essential Python data handling libs.
  • Scikit-learn.
  • Tensorflow / Keras.
  • PyTorch.
  • PySpark. Connect your Python script to a Spark stack.
  • NLTK. Neural language processing lib.
  • Scipy.
  • MxNet. Deep learning lib.
  • XGboost. Gradient boosted decision trees in Pyhton and R.
  • Catboost. Yandex boosted gradient decision trees in Python and R.
  • LGBM. Microsoft boosted gradient decision trees in Python and R.
  • Prophet. Facebook time series forecasting lib.
  • Libsvm. Support vector machines in Python.

Tools

  • Apache Spark. With Hive and AirFlow.
  • Hadoop.
  • Tableau.
  • Linux / Shell scripting.
  • Git / Gitlab / Github.
  • Docker.
  • CD/CI. Jenkins, Gitlab.
  • ElasticSearch.
  • Excel.

Clouds

  • Google Cloud. Functions, storage, big query.
  • AWS.

Database

  • SQL.
  • NoSQL / Relational algebra. Appears 5x less than SQL, but still interesting to learn.

Soft skills

Soft skills were nearly as mentionned as "Python" or "Tensorflow", so they seem really important.

  • Communication. Being able to explain complex algorithms to non-technical clients or other employees. Being able to write reports and documentation on your search work.
  • Self-organisation. Being able to organize your work without direct instructions.
  • Business inteligence / CRM. Being able to understand how AI can improve a business and client relation management.
  • Technological watch. Being able to organise and documentate a technological watch so your company and employees are always open to state of the art technics.

About

My personal roadmap to data science based on job offers requirements

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published