Credit: Lindsay Bongo
Given that Analytics Engineer is a fairly new role on the data team, I wanted to compile a list of resources to be a one-stop knowledge shop.
First, let's be on the same page. What is an Analytics Engineer? Analytics Engineer Definition.
Video Definition with In-depth Explanation (for the audio learners π )
Following data as a product model for data teams, Analytics Engineers (AE) start with the business question. To be business strategic partners and not siloed engineers, Analytics Engineers have sharp business intelligence. Business needs are translated to data needs. AE's supply the data in order to answer business stakeholders' questions.
Looker Community is where business intelligence folks post/comment, etc. If this is where they hang, then this is where they will talk about business metrics of interest i.e. what they want to measure in order to move the business forward. How data is queried and computed can be found in the Looker Community.
On top of the usual business stakeholder, you also have your friendly Data Scientist who needs that dataset to create their predictive models :) Kaggle is where the Data Scientist people hang out. And here are Kaggle's business datasets to get an idea of what sort of columns they would like to see in the data models they would receive from AE's.
Now that you know the general AE's role/responsibilities, here are the skills needed in order to hit OKR's and business goals along with some supplemental readings. Let's go!
MINDSET
DATA WAREHOUSE
SQL
PYTHON
DASHBOARDS / DATA VISUALIZATIONS
EMBEDDED DATA TEAMS IN CERTAIN FUNCTION:
- MARKETING
BIG IDEAS:
OTHER READINGS:
AE TRIBE:
These books/articles helped me to think better when analysing data.
- Common Data Mistakes to Avoid. Excellent summary of the most common fallacies when analyzing data. Very clear and well-explained.
- Thinking fast and slow. Learning about bias can be super useful. For instance, I didn't have the reflex to think of a base rate anytime I see a figure.
- Fooled by randomness. :book: Nassim Taleb taught so much both professionally and personnaly. In Fooled By Randomness, you will learn about major pitfalls when dealing with data in real life.
- Why you should care about the Nate Silver vs. Nassim Taleb Twitter war. Great chess players learn from high elo games. Great data people learn from debate between data experts.
- Five books every data scientist should read that are not about data science. I have not read them all yet. But these suggestions seems judicious.
- One analyst's guide for going from good to great
- Suceeding as the first data person in a small company/startup. A must read for anyone working in data even in a big company.
- Prioritizing data science work. Too many engineers like building ivory towers. Make sure you don't fall in the trap.
- Competitive Advantage by Michael Porter Seminal book for defining how businesses compete
- The Fifth Advantage by Peter Senge This book lays the groundwork of systems thinking upon which many modern management and leadership books have been written.
- Harvard Business Review
- Strategy& strategy consulting business unit of PricewaterhouseCoopers (PwC), one of the Big Four professional service firms
- The beginner guide to data engineering series. Start here if you don't know what is a star schema, Airflow and some basic practices when writing data pipelines.
- Best practices for data modeling. A lot of practical tips on naming, grain, permissions and materialization.
- The Data Warehouse Toolkit by Ralph Kimball. π A classic in Business Intelligence. Some chapters can be gold on modeling your data warehouse.
- Functional Data Engineering β a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- The rise of the Data Engineer. Explains recent evolutions of the job and data practices.
- Five principles that will keep your data warehouse organized
- Using Postgres as a data warehouse I wish I read this post earlier. So much wisdom on Postgres.
- For Data Warehouse Performance, One Big Table or Star Schema?. Discussion on an alternative to star schema.
- Functional Data Engineering β a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- Maintenable ETL: Tips for Making Your Pipelines Easier to Support and Extend. Best practices to write good ETL.
- The Data Warehouse ETL Toolkit π Once again, very dense book but you can find good ideas.
- Automated Testing In The Modern Data Warehouse. Practical advice to test data. Useful for everyone building data pipelines. Rare to found such a post dealing with non-sexy thing in data.
SQL has a lot of tips and tricks that take time to know.
- Mode Analytics SQL Guide. Very complete, even intermediate users can learn from this series of tutorials.
- Learning SQL 201: Optimizing Queries, Regardless of Platform By Randy Au. I finally found a complete post on advanced SQL.
- SQL Optimization from Chartio
- LeetCode Database Problems All levels clearly explained and walked through
- Business Practice Problems Caveat here is that you need membership to get solutions, but there might be some good solutions in Comments. I advise using db-fiddle to create tables and to devlop solutions
- Top Interview Q's from well-known tech co's Not as challenging as Business Practice Problems (good warm-up)
- Real-World Practice Problems
Python is a very broad subject. Maybe you can follow this list for more Python focused readings.
- Python for Data Analysis. π Very comprehensive book about using python for data stuff.
- Pandas Cheatsheet I use it everyday!
- Modern pandas. A series of blog posts on intermediate/advanced pandas written by one of the maintainers.
I found that reading code helps to know the best practices whether it is Python or SQL.
In Python reading some taps from Singer can teach you a lot.
In dbt/SQL I like to browse a repo open-sourced by Gitlab
- Fundamentals of Data Visualisation. Complete guide to visualisation. Free version online.
- Data Driven Marketing. π Reading some chapters can help you think like a marketer with data driven approach. It's a gem. Didn't find this kind of insights elsewhere.
- Introduction to Algorithmic Marketing. π I found good ideas to make more data driven initiatives for marketing. Very dense though, you can pass the equations.
- Building a data practice from scratch. Very useful for your first weeks as a data person.
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- Engineer shouldn't write ETL. It's more data science focused but it's a classic.
- Does my startup data team need a data engineer?
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- The missing layer of Analytics Stack.
- Choosing a Data Warehouse. A lot of excellent answers on what to choose for your data warehouse.
- Data science for start-ups. You can find some useful information in this free book.
- Designing Data-Intensive Applications π Fascinating read to learn more about databases, protocols etc...
- The Modern Data Stack: Past, Present, and Future A must-read on the last innovations in the data stack.
Comparison of tools by Stephen Levin
- Looker vs Tableau vs Mode. Data Visualisation tools compared. .
- Segment vs Fivetran vs Stitch: Which Data Ingest Should You Use?
The concept of analytics engineering is tightly coupled with the ELT view of data warehousing. It is interesting to learn from the people that would prefer the ETL. Reddit comments on Snowflake super-expensive cost
The GitLab data team also made an excellent list. (close to mine)
Analytics Dispatch by Mode Analytics. Very comprehensive.
I really love Reading in Applied Data Science for a more data science focused view.
Knowing more about programming is an huge asset. For instance Professional Programming list is quite complete.
- Randy Au. You can read almost all his posts there are all very relevant for analytics engineers.
- Locally Optimistic. A blog dedicated to data in organizations.
- Tristan Handy. I also love his newsletter: Data Science Roundup.
- Dbt blog. 90% of the articles are almost must-read.
- Ken Farmer It is healthy to read from those who still prefer the ETL stack.
- Holistics.io About the contemporary practice of business intelligence.
- Locally Optimistic Highly recommend joining their Slack!
- Reddit's Data Engineering ETL, Business Intelligence, Data Science channels are also good.
- Slack channels
- MeetUp
- EventBrite
I really appreciate any contribution. If you do, please make sure to describe the theme and why you found the resource useful.