# Data Visualization, Analysis, and Modeling in Python Wrap-Up


Congratulations! You've not only reached the end of this course on Data Visualization, Analysis, and Modeling in Python, but if you got here through the other courses in our *Python Programming for Data Science* specialization, then you've also reached the end of the specialization! Doing so is a *huge* accomplishment and one you should feel exceedingly proud of.


But you probably didn't take these courses for their own sake; you completed them to learn skills you can deploy in the real world! That means this isn't the end of your data science journey; it's just the beginning.


So what's next?



## If you've finished this course (but didn't take the other courses in the specialization)


This course is the fifth in a series of five courses on [Python Programming for Data Science](https://www.coursera.org/specializations/python-for-data-science). As with this course, the focus of this specialization is on learning to work effectively in Python. The specialization covers everything from algorithmic thinking to debugging and testing code, from basic Python syntax to effective data manipulation with numpy and pandas, and from writing short scripts to full programs like a poker simulator.


Also, like this course, the other courses in this specialization don't try to cover topics like machine learning model fitting and diagnostics or statistical inference (in fact, unlike this course, where we have to touch on these topics to talk about implementation, the other courses in the specialization don't really get into these topics at all). To be clear, that's not because we don't feel these topics are unimportant, but rather because we don't feel it's feasible to teach both the programming side of data science and the inference side in the same course and do both justice. If you're looking for an overview of all facets of data science, there are already a lot of courses that cover a bit of everything, but we wanted to create a course that really dove deep into just the programming side of data science. Not only does this allow us to provide a lot more depth than we could cover in a course with a broader curriculum, but we also know many students — e.g., social scientists — who already have a strong background in statistics and inference and are looking for courses that compliment a more traditional training in statistics or social science.


So, if you enjoyed this course and are wondering where to go next, consider checking out the other courses in this specialization!




## If you've completed the specialization


WOW! Seriously, congratulations. This is *not* an easy specialization. We set expectations extremely high in this specialization. The material we've covered comes directly from the courses we teach at Duke University in the [Masters of Interdisciplinary Data Science](https://datascience.duke.edu/). Indeed, if you've gotten this far, you don't need me to tell you we've set the bar higher than what you'll find in most Coursera specializations.


Now that you've completed this specialization, you can be confident you have developed a robust foundation for the rest of your data science journey. Not only do you know how to write data science code in Python, numpy, pandas, matplotlib, and more, but you've also learned the *principles* of good data science programming. These principles — far more than the specific syntax you've learned — are the real takeaway from this specialization because nowhere is the Buddhist principle "All is change" truer than in data science. Libraries and packages will change, programming languages evolve, and even the methods data scientists use will never stand still. But because you understand not just "what" to do when you sit down at a computer but also "why" you do what you are doing, you will be able to ride the waves of changing specifics with a clear sense of purpose.


### What Next?


Now that you've finished this specialization, you may be wondering what you should do next. If you already have a strong foundation in machine learning and statistical inference, honestly, you're in really good shape to start putting everything you know into action (and if you do need to learn more, what you should study probably depends on what interests you and/or what work you want to do).


If you don't already have a strong foundation in statistics and machine learning, then we suggest you consider a course on statistical inference, machine learning, or both. This specialization has focused on how to write code and manipulate data effectively, but that's only one part of data science. The other part is learning to use that data to solve problems. Depending on the types of problems you're trying to solve, you may already have enough knowledge to be impactful. In my experience, often, the most important thing a data scientist can do is gather all the data lying around an organization or business, pull it together, and provide stakeholders with some basic summary statistics and insights.


But if you want to go beyond that, you'll probably want to take a course in statistics or machine learning to help you learn to look for more subtle patterns in data and to evaluate better the confidence you should have that the patterns you find are real.





#### Statistics and Machine Learning


Wait, but what *is* the difference between statistics and machine learning? The terms are actually not particularly well-defined (though, if asked, many people will confidently give you a definition. That definition will just be different everywhere you go. :)). I would argue the answer has more to do with people, perspective, and how universities are organized than something deep and conceptual. Namely, statistics is how statisticians (and social scientists) think about data science, while machine learning is how computer scientists think about data science. Why are those different? Well, to answer that question, we need to talk a little about academia.


Data science, when done well, is a fundamentally interdisciplinary undertaking. Over the past several decades, the proliferation of new sources of data and computing power has resulted in nearly every academic discipline developing new computational techniques. But for all universities *love* to pay lip service to the importance of interdisciplinarity, the reality is that universities are starkly divided into disciplinary silos (e.g., computer science, statistics, political science, economics, and engineering). This isn't because researchers aren't *interested* in interdisciplinary collaborations, but rather that their professional imperatives push them to focus their attention on the priorities and language of their own departments and disciplines.


As a result, nearly every academic discipline has developed a perspective on what is broadly called "data science" that emphasizes its own intellectual priorities. And there has been *shockingly* little work done to create a unified perspective on the tools developed across disciplinary boundaries.


To illustrate, suppose we were interested in using patient data to reduce heart attacks. A computer scientist looking at this problem might use their discipline's methods to *predict* which patients are most likely to experience a heart attack in the future using current patient data; a social scientist might focus on trying to understand the *effect* of giving patients a new drug on heart attack risk; and a statistician might focus on understanding *how confident* we should be in the conclusions reached by the computer scientist and social scientist.


This fragmentation has also resulted in a fragmentation of *language* around data science methodologies. Disciplines often come up with different terminology for the same phenomena, adding another layer of difficulty to efforts to work across departmental silos.


The result is a situation analogous to the Buddhist parable of the blind men and the elephant, wherein a group of blind people come upon an elephant and, upon laying hands on different parts of the elephant, come to different conclusions about what lies before them. The person touching the tail declares, "We have found a rope!", while the person touching the leg declares, "We have found a tree!"


![blindfolded scientists feeling an elephant](img/blindmenelephant.jpg)


And yet, as the poet John Godfrey Saxe wrote in his poem [*The Blind Men and the Elephant*](https://en.wikipedia.org/wiki/Blind_men_and_an_elephant#John_Godfrey_Saxe) about this parable many centuries later:


```
And so these men of Indostan,
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!
```


In recent years, however, there has been a growing appreciation of what can be gained from pulling together the insights that have been developed in different fields despite the challenges of language and professional imperatives to such collaborations. And, at least amongst those who are serious about the development of data science as a discipline and not just a buzzword to use when raising money, is the promise of data science: to unify the different perspectives and methods for analyzing data. Or, to put it more succinctly: to finally see the whole elephant.


But until this happens, it is incumbent upon you, the budding data scientist, to seek out these different perspectives yourself.


So, what differentiates statistics and machine learning? I suggest it's mostly who is teaching the class. Both will lay claim to techniques like linear regression or logistic regression, but the way these techniques are taught by a computer scientist will likely be different from how they are taught by a statistician. And both of the perspectives they offer will be correct. So consider learning a little of both!

## Specific Coursera Offerings

- [Andrew Ng's Machine Learning Specialization](https://www.coursera.org/specializations/machine-learning-introduction): Andrew's ML courses are as much of a "classic" as anything can be in such a young field.
- [Specialization in Data Analysis Statistics](https://www.coursera.org/specializations/statistics): This is a *great* course from an exceptional instructor. The only downside (to someone taking this class!) is that examples are given in R. With that said, given all you've learned in this specialization, you should have no problem taking the course for the *conceptual* learnings about statistics and doing the exercises in Python.
- [Statistics with Python Specialization](https://www.coursera.org/specializations/statistics-with-python): A well rated specialization in statistics (in Python).
- [Introduction to Machine Learning (Duke)](https://www.coursera.org/learn/machine-learning-duke/): the name is a bit misleading, as this is focused on a specific tool used a lot in Machine Learning (neural networks), but it's a very good and highly rated stand-alone course (not a specialization).
