Links
Ryan McGranaghan edited this page Mar 2, 2023
·
55 revisions
Clone this wiki locally
This page contains links to useful resources. *Please note that these lists are in no way exhaustive and are meant to provide a foundation from which to discover the massive amount of material out there and to do so in a digestible and manageable way. Keep checking back here as this is a fluid list and will evolve over time.
Happy exploring!
General
- Apache Software Foundation
- FigShare's 'The State of Open Data'
- Data Science Weekly
- Association for Computing Machinery (ACM)
-
The Python Graph Gallery
- In general, graph galleries can be very helpful to learn about data visualization techniques, quickly create new visualizations, and improve you understanding of the data
- Open Scientist Handbook - a reference work you can use to become an open science change agent in your department, laboratory, college, learned society, or research agency
Books
- The Elements of Statistical Learning
- Data Science From Scratch
- Python Data Science Handbook
- Introduction to Machine Learning with Python
- Computer Age Statistical Inference
- Machine Learning - Tom Mitchell
- Think like a data scientist: Tackle the data science process step-by-step
- Developing Analytical talent: Becoming a data scientist
- An introduction to statistical learning with applications in R
- A simple introduction to Data Science: Book 2
- Networking for Big Data
- Python for data analysis
- Big Data Management for Dummies, 2nd Informatica Special Edition
- Neural Networks and Deep Learning
- Deep Learning Cookbook: Practical recipes to get started quickly
- Machine Learning with TensorFlow, Second Edition
- (for absolute beginners) Model-Based Machine Learning, John Winn, Chris Bishop, et al. Bishop wrote the book on ML in the early 2000's (PRML); this is his latest book aimed at a non-expert audience, presented using a fun murder mystery narrative
- (intro to deep learning) Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville. Has become the standard book on DL, by one of the "godfathers" of the field (Bengio). Very pragmatic view (aimed towards DL engineers)
- (intro to probabilistic ML) Machine Learning: a Probabilistic Perspective, Kevin Murphy. Has come to replace PRML (the book on ML from the early 2000's). Murphy has been in the field forever.
- (practical coding in DL) Dive into Deep Learning, Zack Lipton, Alex Smola, et al. An interactive deep learning book with code, math, and discussions; Provides NumPy and PyTorch iPython tutorials. Smola has been in the field forever, Zack is well known in the field
- (maths foundations for ML) Mathematics for Machine Learning, Marc Deisenroth et al. Mathematical foundations needed to understand ML beyond just implementing stuff. Deisenroth has been around for quite some time as well (working mostly in MBRL actually)
Academic Journals
- Artificial Intelligence
- Big Data
- Big Data Research
- International Journal of Data Science and Analytics
- Journal of Big Data
- Sigkdd Explorations
- Institute of Electrical and Electronics Engineers Big Data
- ACM Knowledge Discovery from Data
- ACM Transactions on Data Science
Learning Communities/Communities of Practice
- The NASA Center for HelioAnalytics (or contact Ryan McGranaghan) - building a Community of Practice around an informatics/data science approach to Heliophysics science
- The Earth Science Informatics Partners - one of the foremost leaders in promoting the collection, stewardship and use of Earth science (and related) data, information and knowledge that is responsive to societal needs
- The AI Learning Salon - a weekly forum to explore bridges and contentions in biological and artificial learning
- The Royal Institution - youtube series covering a wide range of science topics, including data and intelligence
Tutorials
Online courses
- Learning From Data (Introductory Machine Learning) - California Institute of Technology
- Santa Fe Institute's "Introduction to Open Science" course - the new language to understand open science principles and tools
- CS109 Data Science - Harvard
- Computational Probability and Inference - Massachusetts Institute of Technology
- Data Science - Johns Hopkins
- Machine Learning - Andrew Ng
- Stanford CS231n: Convolutional Neural Networks for Visual Recognition
- How to Process, Analyze and Visualize Data
- Data Visualization with Python
- Lynda tutorials -- this is an excellent resource, though requires a membership
- Earth Data Science -- contains open tutorials and course materials covering topics including data integration, GIS and data intensive science (>250 Earth data science lessons)
- Interpretability and Explainability in Machine Learning Harvard University
- Wonderful example of online learning resource for data science topic: Elements of AI
Elements to look for in online courses
- Focuses on practical skills. Those that are perhaps most wide-reaching include Python and R programming, Jupyter Notebooks, scikit-learn, TensorFlow and Keras, pandas, xarray
- Provides quick, quality, and consistent feedback
- Is free or inexpensive (paying for a course that is worthwhile is a good way to get yourself to commit to it!)
- Is project-oriented
- Contains an excellent social interaction component
Other tutorial and learning resources
- Santa Fe Institute Complexity Explorer - online courses, tutorials, and resources essential to the study of complex systems
- Software Carpentry
- eScience Institute Tutorials (University of Washington)
- Peter Norvig's `Pytudes' github resource - Python programs to practice or demonstrate skills
Data Visualization resources
- Book: The Visual Display of Quantitative Information by Edward Tufte
- Book: Storytelling with data by Cole Nussbaumer Knaflic
- Book: The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo
- Article: "A tour through the visualization zoo" by Heer et al.
- Tool: Tableau
- Tool: Plotly
- Tool: Paraview
- Tool: Bokeh
- Tool: Holoviz
- Tool: Cinema Science
- Interactive Course: Information Visualization (excellent resource with interactive Jupyter notebooks for each leasson)
Compilations of resources
- A fantastic set of links and resources is available on the HelioAnalytics website operated by Barbara Thompson and colleagues at the NASA Goddard Space Flight Center
- Top Data Science Resources
Blogs
- Data Science Central
- Data Science on Reddit
- No Free Hunch (a Kaggle blog)
- What's the Big Data?
- Information Is Beautiful
- Kdnuggets
Podcasts
- Artificial Intelligence Podcast with Lex Fridman
- Sean Carroll's Mindscape Podcast
- Data Skeptic
- Data Stories
- The O'Reilly Data Show
- Talking Machines
- Data Camp
- Future Tense
- Microsoft Research
- Voices from DARPA
- Berkeley School of Information
- Grey Mirror Podcast
Ways to become active (i.e., the best way to learn)
- Start working on open source projects (see links below)
- Compete in a Kaggle competition
- Join or start a Meetup and attend or host a Hackathon
- Collaborate with a data scientist (e.g., find one at your university or work)
- Reach out to a potential mentor
- Take an online course
- Explore your passions in a data-driven manner
- 'Lurk' - join community email lists or forums to gain exposure to the language before contributing more actively
Open source projects and links
- Apache Software Foundation: Mission is to provide software for the public good
- papers with code: Mission is to create a free and open resource with Machine Learning papers, code and evaluation tables
- Python Scikit-Learn: Mission is to provide free and open machine learning library in the Python programming language
- Go: Mission is to provide an open-source curriculum for learning Data Science. Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to making use of data.
Specific topics
Explainable Artificial Intelligence (XAI)
-
DARPA XAI Program
- Here's a good overview of explainable AI methods that performers in DARPA's XAI program explored. It's a good retrospective on the entire program. Excellent reference list
- The Ethics and Governance of AI (an MIT Media Lab Course)
- Interpretability and Explainability in Machine Learning (a Harvard University course)
- The Mythos of Model Interpretability
- The AI Now Institute - an interdisciplinary research center dedicated to understanding the social implications of artificial intelligence
- (from Mark Wronkiewicz) This is an aging blog post, but it helped me quite a bit when I was trying to get up to speed. It’s high level with good breadth (explains why you should care, name drops many important methods, and gives frameworks for mentally organizing those different methods). It’d also be good for the reference list
Frameworks for trustworthy, accountable, explainable systems (AI and other)
- Findability, Accessibility, Interoperability, and Reusability (FAIR) Principles
- A framework for model/system accountability: Algorithmic Impact Assessments
AI and ML applied to science
- Learn from a fantastic use case for ML and AI in the sciences: EarthML
- SpaceML: Started in December 2020 as a machine learning toolbox and developer community building the next generation AI applications for space science and exploration
Scientific/research workflows
Open Science
- NASA's Transform to Open Science (TOPS) - excellent compilation of resources
- Santa Fe Institute's "Introduction to Open Science" course - the new language to understand open science principles and tools
The social component of data science - becoming transdisciplinary
- Antidisiciplinary
- American Geophysical Union Town Hall meeting
- International Bateson Institute
- Directory of behavioral teams
Tools to improve virtual collaboration
Compilations of resources:
Post-it Note-like boards and resources:
- MilaNote
- Whimsical
- Kumu - very useful for quickly building networks and knowledge graphs
- Miro - LOVE this - easy and free; powerful for post-it notes board
- Mural
- Smartsheet
- Figma
- Whova
- Padlet - collaborative Super-powered Sticky Notes with Drawing
- Pinup
- Google Jamboard
Polling resources:
Virtual Conference Tools:
- Wonder.me
- GatherTown
- Remo Conference
- Jitsi - free service
- Zoom
- Microsoft Teams
General Interaction Tools:
- Github - under-rated as a full-stack collaboration tool (even for writing papers and proposals, not just for software)
- Slack
- Discord - place for teams to 'hang out' and work
- QiqoChat - Recommended by the Earth Science Information Partners group; a wrapper around Zoom (but other platforms are possible including Jitsi). Take virtual meetings to the next level and encourage engagement in a variety of ways, not just webinar-style watching
- Whereby - like Zoom, but in many ways simpler, easier (no downloads or installs, same link each time); free for individual use