I've been working in data analytics pretty much since the back half of my junior year of college. When I first started at Rice, I was totally unsure of what I wanted to do career-wise, but I'd like to think I've gotten a little bit more clarity on my direction since then. Thanks to sports, I've come to really love working with data. Not just in the number-crunching or modeling sense, but also with data visualization.
My focus on data visualization has really picked up over the last year and a half or so. When I showed Elijah Meeks my network graphs from my first sports analytics project back in 2016, he audibly revolted (or as audibly as you can do on a digital medium, I suppose). Everyone has to start somewhere. And that's important! Even today, I've still got 10 tabs of package documentation open as I code. But over the last couple years spent dedicated to improving my coding and technical data viz skills, I've picked up some best practices and re-usable modules that I like to think of as my guide points. I'm laying them out in this repository for anyone else looking to do data visualization in the wonderful world of Python.
- Advanced Packages: While Seaborn and Matplotlib can get you fairly far, features like interactivity or unique chart types can be lacking. However, there are some really powerful other 3rd party packages out there in the Python ecosystem. In this notebook, I'll take a little bit of time to discuss these packages, show how to get familiar with the syntax, and show some unique examples highlighting the capabilities of each package.
- Style Sheets: The default Matplotlib styling can be a great source of consternation, and everybody will have their own unique styles and needs. However, it can be tedious to write the same repetitive code over and over to customize each graphic. Fortunately, we can pretty easily customize persistent configurations for our graphs, and I'll demonstrate the various ways of doing so in this notebook.
A: Reading documentation, as with most coding, is honestly the best place to get intimately familiar. But, also as with most coding, the second best way is to just start by doing. I'm hoping that this repository eases your transition to "just doing" or maybe shows a technique that you may not have known previously. What really works for me is just taking examples that I find in the docs for a library and then googling/reading documentation as I manipulate every aspect of the example until I'm satisfied with the variety of outputs. What also works for me is creating this repo so I don't forget the things that I've learned....
A: Ah, here we come to the crux of the subject. Using completely public or synthetic data, I've built a set of notebooks that span a range of topics in Python data viz, from fundamentals to different graph types to building complex objects like ridge plots. The following jupyter notebooks (which can render completely in GitHub in your browsers) are included currently:
- Part 1 Key Principles: Fundamentals and best practices of data visualization using Matplotlib, Python's ground level data viz library
- Part 2 Archtypes of Viz: Creating various different data viz archetypes and discussing use cases
- Part 3 Analytics Viz: Demonstrating the incorporation of data viz as part of the scientific/research process
- Part 4 Complex Viz Manipulation: Techniques for manipulating our fundamental viz templates to create some more complex data viz and data viz systems
- Part 5 Style Sheets: Methods for setting up graph styling customizations that will persist through your notebook or script
- Part 6 Advanced 3rd Party Packages: Introducing and discussing powerful packages to extend our data visualization capabilities in Python
A: You could always clone the repo, but if you want to see the entire rendered notebook, I recommend using NBViewer, which, unlike Github's preview interface, will display JS and rich graphs.
A: This repo is a living destination, and I consider it to be perpetually a work in progress, as I continue learning and get new inspiration. With that said, there's still some more notebooks that I'm building out already:
- Interactive visualizations, showing how to build animations as well as interactive controls inside a jupyter notebook
A: So this is not really a question, but please reach out if you have any feedback or have any requests/ideas/inspiration. I can be found on Twitter @SENTHIS.