Skip to content

Latest commit

 

History

History
48 lines (40 loc) · 1.96 KB

File metadata and controls

48 lines (40 loc) · 1.96 KB

Python Correlation with Movies Data

Python project for finding correlations between variables.
The findings are based on the correlation between gross revenue of movies and other variables such as the budget of the movies and the votes received by them. The dataset available on link.

Built with

  • Jupyter notebook

Libraries used

  • Pandas
  • Matplotlib
  • Seaborn

Highlights

  • Scatter Plot
  • Regression Plot
  • Heatmap
  • Python regex
  • Python Categorical data

What is being done?

  1. Preliminary tasks
    1.1. Importing libraries
    1.2. Create a dataframe movies from CSV
    1.3. Checking for missing data
    1.4. Checking the datatypes of the columns
    1.5. Altering the datatypes of few columns
    1.6. Adding new column for correctness and data consistency
    1.7. Dropping duplicate rows
  2. Task: Finding if there is a correlation in two particular columns: budget and gross
    2.1 Scatterplot of budget and gross
    2.2 Regression plot of budget and gross
    2.3 Finding pairwise correlation of numeric columns
    2.4 Heatmap of the correlation values
  3. Task: Trying to find general correlation among different columns of the data as a whole
    3.1 Representing the non-numeric values into numeric values using categorical datatype
    3.2 Finding pairwise correlation of numeric columns
    3.3 Heatmap of the correlation values
    3.4 Displaying the correlation values linearly
    3.5 Sorting the linearly displayed correlation values
    3.6 Finding the pairs with higher correlation
  4. Observation and Conclusion
    4.1 There is a high correlation in budget and gross. This says that as the high budget movies are observed to have achieved high gross revenues.
    4.2 Also, there is a fairly high correlation in votes and gross as well. So, this also says that the movies with high gross revenues have received large number of votes.