ML_at_Scale

A quick description of the roles found in organizations that use ML at Scale

Overview

"Moore's Law is dead." - Bill Dally, NVIDIA Chief Scientist

The above quote was in reference to increasing hardware performance to train and test large datasets. But to increase the performance and scale of human operations, the standard approach is for increasingly specialized roles.

ML is all well and good, but in a business context it must be able to deliver value to the business. For that reason, it makes sense to create specialist roles within a businesses' ML department.

Roles

As shown above, there are four roles in Production ML departments:

Business Analyst
Data Engineer
Data Scientist
Production Engineer

All of the above roles need some understanding and experience of ML practices.

Business Analyst

This role is more of an outward-facing role. It is over-arching and is about re-framing ML practices into a business framework in terms of business value (as defined by deliverables).

It is probably the least technical role, although technology and ML expertise still plays a large factor in how valuable this person may be.

Data Engineer

This role is largely technical and mainly involves ETL; however, data cleaning and transformation plays a large part. Populating and/or enriching sparse data may also be a requirement. Database knowledge (both of traditional relational databases [SQL] as well as NoSQL [CQL, GraphQL] databases) and expertise is critical for this role.

Data Scientist

This is the most purely technical role. Experienced data scientists are very hard to find and generally have very specific (and deep) expertise and training. For this reason, it is best that they are provided with clean data so that they can focus their energies on tasks that they have been trained to do.

Mathematics, statistics, probability theory and generally either R or Python expertise seem to be the main requirements for this role.

Production Engineer

This role calls for a generalist. Once an ML model has been trained, it must then be deployed in order to become operational and deliver business benefit.

The how and why of this deployment can be very technical, but generally will involve a number of separate disciplines - mostly involving more traditional IT practices.

Generally, this is the role that a traditional data scientist is least qualified to perform. For this reason, it is a good idea to source a candidate with a much broader skill-set. What these skills will need to be largely depends upon the pre-existing production systems.

Rolling Deployment

Trained models degrade quickly - which means that models must be re-trained from time to time (as new data becomes available) and likewise re-deployed.

This largely involves the data scientist and the production engineer - who must have knowledge of staged deployments as well as - ideally - expertise with CI/CD practices.

References

ETL

http://en.wikipedia.org/wiki/Extract%2C_transform%2C_load

[Well worth reading for some historical insight into this process.]

Moore's Law

http://en.wikipedia.org/wiki/Moore's_law

States (a little sniffily) that:

Moore's law is the observation

Moore's Law is dead

https://changelog.com/practicalai/15

[A fairly interesting overview of recent work from nVidia.]

Credits

Inspired by this podcast on Behavioral economics and AI-driven decision making:

http://changelog.com/practicalai/9

[Practical AI is an excellent podcast, some great stuff and worth following.]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML_at_Scale

Overview

Roles

Business Analyst

Data Engineer

Data Scientist

Production Engineer

Rolling Deployment

References

ETL

Moore's Law

Moore's Law is dead

Credits

About

Releases

Packages

mramshaw/ML_at_Scale

Folders and files

Latest commit

History

Repository files navigation

ML_at_Scale

Overview

Roles

Business Analyst

Data Engineer

Data Scientist

Production Engineer

Rolling Deployment

References

ETL

Moore's Law

Moore's Law is dead

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages