Skip to content

ssarrayya/datacamp-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 

Repository files navigation

DataCamp Guided Projects

A collection of all projects I have done on DataCamp

On SQL:

  • Analyzing NYC Public School Test Result Scores Every year, school test results play a role in deciding the fate of millions of students. In America, the SAT is a major part of the college admissions process. In this project, I worked with a SQL database containing test performance from NYC's public schools. I looked at how performance varies by borough, identified how many schools fail to report information, and found the top ten performing schools across the city!

  • Exploring the Bitcoin Cryptocurrency Market To better understand the growth and impact of Bitcoin and other cryptocurrencies, I explored the market capitalization of different cryptocurrencies. Warning: The cryptocurrency market is exceptionally volatile, and any money you put in might disappear into thin air. Never invest money you can't afford to lose.

  • Optimizing Online Sports Retail Revenue Sports clothing is a booming sector! In this notebook, I used my SQL skills to analyze product data for an online sports retail company. I worked with numeric, string, and timestamp data on pricing and revenue, ratings, reviews, descriptions, and website traffic. I used techniques such as aggregation, cleaning, labeling, Common Table Expressions, and correlation to produce recommendations on how the company can maximize revenue!

  • Analyze International Debt Statistics It's not that we humans only take debts to manage our necessities. A country may also take debt to manage its economy. For example, infrastructure spending is one costly ingredient required for a country's citizens to lead comfortable lives. The World Bank is the organization that provides debt to countries. In this project, I analyzed international debt data collected by The World Bank. The dataset contains information about the amount of debt (in USD) owed by developing countries across several categories. I found the answers to questions like: What is the total amount of debt that is owed by the countries listed in the dataset?, Which country owns the maximum amount of debt and what does that amount look like?, What is the average amount of debt owed by countries across different debt indicators? The data used in this project is provided by The World Bank. It contains both national and regional debt statistics for several countries across the globe as recorded from 1970 to 2015

  • What and Where are the World's Oldest Businesses An important part of business is planning for the future and ensuring that the business survives changing market conditions. Some businesses do this remarkably well and last for hundreds of years. In this project, I explored data from BusinessFinancing.co.uk on the world's oldest businesses: when were they founded, and which industries do they belong to? Like many business problems, the data is contained in several different datasets. In order to understand the world's oldest businesses, I used joining techniques to merge our data. From there, I used manipulation tools such as grouping and filtering to answer questions about these historic businesses.

  • Analyzing American Baby Name Trends What makes a name timeless or trendy? In this project, I used data published by the U.S. Social Security Administration spanning over a hundred years to understand American baby name tastes. The ranking, grouping, joining, ordering, and pattern matching skills I used in this project are broadly applicable: understanding changing tastes is a key competency for businesses as well as parents searching for a baby name!

  • When Was the Golden Age of Video Games? In this project, I analyzed video game critic and user scores as well as sales data for the top 400 video games released since 1977. I searched for a golden age of video games by identifying release years that users and critics liked best, and explored the business side of gaming by looking at game sales data.

On Python:

  • Investigating Netflix Movies and Guest Stars in The Office Here, we discover if Netflix’s movies are getting shorter over time and which guest stars appear in the most popular episode of "The Office", using everything from lists and loops to pandas and matplotlib. I also gained experience in an essential data science skill — exploratory data analysis.

  • Comparing Search Interest with Google Trends Time series data is everywhere; from temperature records, to unemployment rates, to the S&P 500 Index. Another rich source of time series data is Google Trends, where you can freely download the search interest of terms and topics from as far back as 2004. This project dives into manipulating and visualizing Google Trends data to find unique insights. I explored the search data underneath the Kardashian family's fame and made custom plots to find how the most famous Kardashian/Jenner sister has changed over time. This is a guided project

  • Dr. Semmelweis and the Discovery of Handwashing In 1847, the Hungarian physician Ignaz Semmelweis made a breakthough discovery: he discovers handwashing. Contaminated hands was a major cause of childbed fever and by enforcing handwashing at his hospital he saved hundreds of lives.

  • Recreating John Snow's Ghost Map In 1854, Dr. John Snow (no, not the Game of Thrones's character) used a pre-computer method of spatial analysis by mapping patterns and occurrences of cholera outbreaks in Soho, London. He mapped the deaths in the neighbourhood and determined that a vast majority occurred around one particular water well and that those that died used that well. This is not only one of the earliest uses of data visualization, but by solving this problem, Dr. John Snow also founded spatial analysis and modern epidemiology. In this Python project, I reanalyze the data and recreate his famous map.

  • The GitHub History of the Scala Language Open source projects contain entire development histories, such as who made changes, the changes themselves, and code reviews. In this project, I was challenged to read in, clean up, and visualize the real-world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). With almost 30,000 commits and a history spanning over ten years, Scala is a mature language. I find out who has had the most influence on its development and who are the experts. The dataset includes the project history of Scala retrieved from Git and GitHub as a set of CSV files.

  • The Android App Market on Google Play Mobile apps are everywhere. They are easy to create and can be lucrative. Because of these two factors, more and more apps are being developed. In this project, I did a comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories. I looked for insights in the data to devise strategies to drive growth and retention. The data for this project was scraped from the Google Play website. While there are many popular datasets for Apple App Store, there aren't many for Google Play apps, which is partially due to the increased difficulty in scraping the latter as compared to the former. The data files are as follows: - apps.csv: contains all the details of the apps on Google Play. These are the features that describe an app. - user_reviews.csv: contains 100 reviews for each app, most helpful first. The text in each review has been pre-processed, passed through a sentiment analyzer engine and tagged with its sentiment score.

  • A Visual History of Nobel Prize Winners The Nobel Prize is perhaps the world's most well known scientific award. Every year it is given to scientists and scholars in chemistry, literature, physics, medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the prize was Eurocentric and male-focused, but nowadays it's not biased in any way. Well, let's find out! What characteristics do the prize winners have? Which country gets it most often? And has anybody gotten it twice? The dataset used in this project is from The Nobel Foundation on Kaggle.

  • Hypothesis Testing in Healthcare Pharmaceutical drugs have become an essential part of our health. Therefore, they need to be safe with little or no adverse effects. In this scenario, I work with a non-profit that advocates for pharmaceutical drug safety. One of its tasks is to create reports on drugs independent of the drug manufacturer. I conducted several hypothesis tests using Python to determine if the adverse reactions of a hypothetical drug are significant. I also checked if factors such as age significantly influence the adverse reactions.

  • Predictive Modeling for Agriculture I dove into agriculture using supervised machine learning and feature selection to aid farmers in crop cultivation and solve real-world problems.

  • Exploring NYC Public School Test Result Scores Every year, school test results play a role in deciding the fate of millions of students. In America, the SAT is a major part of the college admissions process. In this project, I worked with a dataset containing test performance from NYC's public schools. I identified the schools with top math results, looked at how performance varies by borough, and found the top ten performing schools across the city

  • Analyzing Crime in Los Angeles Los Angeles, California, attracts people from all over the world who are looking to be successful and make a name for themselves. And with that comes a lot of opportunities, not always of the good kind! While LA has a reputation for beautiful weather and a laid back lifestyle, it's also been renowned for a high volume of crime, which is not particularly surprising given it is the second most populous US city. In this project I served as a data detective, supporting the Los Angeles Police Department (LAPD) in analyzing their crime data to help shape how they should allocate resources to best protect the people of their city!

Releases

No releases published

Packages

No packages published