Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

https://eugeneyan.com/writing/what-does-a-data-scientist-really-do/ #23

Open
utterances-bot opened this issue Sep 22, 2020 · 4 comments

Comments

@utterances-bot
Copy link

utterances-bot commented Sep 22, 2020

What does a Data Scientist really do?

No, you don't need a PhD or 10+ years of experience.

https://eugeneyan.com/writing/what-does-a-data-scientist-really-do/

Copy link

Absolutely spot on eugene. It would be really helpful if you can tell us more about these aspects

  1. Building frameworks (e.g., validation) and pipelines
  2. Running experiments, monitoring, and analysing
  3. Putting the data product into production

How can one learn and apply these in a project. Any good end to end example which showcase these 3 steps .

PS : Really like your website and articles

@eugeneyan eugeneyan changed the title https://eugeneyan.com/writing/what-does-a-data-scientist-really-do/?ck_subscriber_id=1009377720 https://eugeneyan.com/writing/what-does-a-data-scientist-really-do/ Sep 22, 2020
@eugeneyan
Copy link
Owner

Wow, that's a difficult question. Here's my humble attempt at planning such a project that covers those three aspects:

Problem statement

  • Given the historical price and Twitter data, can we predict next day's stock price?

Building frameworks (e.g., validation) and pipelines

  • Data acquisition pipeline (e.g., yahoo finance and tweets on specific tickers)
  • Monitor frequency of tweets and yahoo finance data; notify if long period without data
  • Validate correctness of the data format (though admittedly, yahoo finance and twitter data is pretty clean; perhaps check for emoticons or non-ASCII characters)

Running experiments, monitoring, and analysing

  • Predict tweet sentiment to aggregate public sentiment on stock ticker
  • Predict next day's price based on historical price and trending tweet sentiment
  • Monitor model performance of next-day stock price prediction
  • Error analysis on largest errors

Putting the data product into production

  • Online dashboard with daily update
  • Visualize tweet and predicted sentiment
  • Visualize historical price, predicted price, actual price

Thank you for your kind words!

@matrix21
Copy link

matrix21 commented Oct 1, 2020 via email

Copy link

Excellent article!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants