Skip to content

matthew2004-ap/Netflix_Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Netflix Titles — Power BI Project

This repository contains a single dataset file, netflix_titles.csv, intended for exploration and reporting in Power BI Desktop. The README explains the dataset, assumptions, quick import steps for Power BI, suggested analyses and visuals, and credits/licenses.

Quick summary

  • File: netflix_titles.csv (located at repository root)
  • Purpose: Exploratory analysis, dashboards and reporting in Power BI using Netflix titles metadata (movies & TV shows).

Dataset (columns in this CSV)

I inspected the netflix_titles.csv file in this repo and verified the header and sample rows. The CSV uses the common Netflix titles schema and includes these columns (exact names):

  • show_id — unique id per title (e.g., s1, s2)
  • type — "Movie" or "TV Show"
  • title — title name
  • director — director(s) (may be blank)
  • cast — main cast (comma-separated list; may be blank)
  • country — production country or countries (comma-separated)
  • date_added — date the title was added to Netflix (string like "September 24, 2021")
  • release_year — year of release (integer)
  • rating — content rating (e.g., PG-13, TV-MA)
  • duration — runtime or season info (examples: "90 min", "1 Season", "2 Seasons", "9 Seasons", "13 min")
  • listed_in — comma-separated genres / categories
  • description — short text description

Sample notes from the file:

  • date_added values are in the format "MonthName DD, YYYY" (e.g., "September 24, 2021"). Use Power Query to parse this to Date.
  • duration mixes minutes and season counts. Movies usually show minutes ("90 min"); TV shows show season counts ("1 Season", "2 Seasons"). Some short-form items show minutes ("13 min").
  • cast, country, and listed_in are multi-valued comma-separated strings requiring splitting/unpivoting if you want one row per cast member/genre.

If your CSV version has extra columns or different names, adapt mapping during import. The rest of this README uses these exact column names as present in the file included in the repo.

How to open in Power BI Desktop

  1. Open Power BI Desktop.
  2. Choose "Get data" → "Text/CSV" and select netflix_titles.csv.
  3. In the preview dialog, click "Transform Data" to open Power Query for cleaning.
  4. Recommended Transform steps in Power Query:
    • Convert date_added to Date (Split or parse if it contains month names).
    • Clean duration: split into numeric duration and unit. For movies, strip " min" and convert to number; for TV shows you may keep the season count separate.
    • Split listed_in and cast into lists if you plan to analyze by individual genre/cast member (use "Split Column" by delimiter and/or unpivot techniques).
    • Trim whitespace and replace missing/nulls for consistent filtering.

Power Query (M) examples

Here are a couple of quick Power Query / M transformations you can apply in the Query Editor.

  1. Parse date_added to Date (handle blank values):

    • In Power Query Advanced Editor you can add a step like:

      let Source = Csv.Document(File.Contents("netflix_titles.csv"),[Delimiter=",", Columns=12, Encoding=65001, QuoteStyle=QuoteStyle.Csv]), Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), Parsed = Table.AddColumn(Promoted, "date_added_dt", each try Date.FromText(Text.Trim([date_added])) otherwise try Date.FromText(Date.ToText(Date.From(Text.Trim([date_added])))) otherwise null) in Parsed

    • Simpler: select date_added column → Transform → Using Locale → Date → English (United States) to parse month names.

  2. Extract movie runtime in minutes into a numeric column duration_minutes:

    • Add Column → Custom Column with formula:

      = if Text.Contains([duration], "min") then Number.From(Text.Select([duration], {"0".."9"})) else null

    • Or, in M:

      Table.AddColumn(Promoted, "duration_minutes", each try Number.From(Text.Select([duration],{"0".."9"})) otherwise null)

  3. Split multi-valued columns (listed_in, cast, country) for per-item analysis:

    • Use Split Column by Delimiter (comma) → then Transform each to trimmed text.
    • Or to create one row per genre/actor: Split into rows (Home → Split Column → By Delimiter → Advanced options → Split into Rows).

Suggested visuals and analyses

  • Titles over time: bar/line chart of count of titles by release_year or by date_added (month/year).
  • Titles by type: donut or stacked bar showing Movies vs TV Shows.
  • Top genres: split listed_in and show top N genres by title count.
  • Country breakdown: map or horizontal bar chart for top producing countries.
  • Ratings distribution: stacked bar of rating by type or by year.
  • Duration analysis: histogram of movie runtimes (after extracting minutes).
  • Cast / Director network: create tables listing top directors and actors by title count.

Example DAX measures (adjust column/table names if different):

Example DAX measures (adjust table name to the one you use when loading the CSV, e.g. netflix_titles):

  • Total Titles = COUNTROWS('netflix_titles')
  • Movies Count = CALCULATE(COUNTROWS('netflix_titles'), 'netflix_titles'[type] = "Movie")
  • TV Shows Count = CALCULATE(COUNTROWS('netflix_titles'), 'netflix_titles'[type] = "TV Show")
  • Titles Added This Year = CALCULATE(COUNTROWS('netflix_titles'), YEAR('netflix_titles'[date_added_dt]) = YEAR(TODAY()))
  • Average Movie Runtime = AVERAGEX(FILTER('netflix_titles', NOT(ISBLANK('netflix_titles'[duration_minutes]))), 'netflix_titles'[duration_minutes])

These measures assume you created a Date/DateTime column date_added_dt and numeric duration_minutes in Power Query.

Tip: If duration contains text like "90 min", create a Power Query column duration_minutes by removing non-numeric characters and converting to Whole Number. For TV shows you can extract season counts separately.

Small contract / success criteria

  • Inputs: netflix_titles.csv in repository root
  • Output: Power BI report visualizing title counts, genre breakdown, and duration distribution
  • Error modes: missing columns (adjust mapping), unparseable dates/durations (use Power Query transformations)

Edge cases to watch for

  • Missing or blank date_added or duration values
  • Multiple countries/genres in one cell (need to split/unpivot for per-item counts)
  • Inconsistent duration formats between movies and TV shows

Credits & License

This README references the widely shared "Netflix Movies and TV Shows" dataset commonly used in public tutorials (for example, copies hosted on Kaggle and other public data repositories). If your copy of the CSV has a different origin, update this credit accordingly.

Usage: This repository is intended for educational, exploration and reporting purposes only. The dataset contains metadata originally published by Netflix and/or aggregated by third parties — you must follow the original dataset's license and Netflix's terms of use when redistributing data or derivative works. If you are publishing dashboards or derivatives externally, check the dataset source for any attribution or redistribution restrictions.

Suggested license note for this repo (not a legal statement):

"This repository contains an example dataset for analysis and a set of instructions for working with it in Power BI. The dataset file netflix_titles.csv may be redistributed only under the terms permitted by its original publisher. The README and example transforms are published under the MIT License — use at your own risk."

If you'd like, I can add a LICENSE file (MIT) to the repo; tell me if you want that.

Next steps / suggestions

  • Build a sample Power BI report with the visuals above and include a .pbix sample file in the repo (optional).
  • Add a small notebook or script (Python/R) that performs basic cleaning (extract duration minutes, parse dates) to show reproducible ETL.

Contact

If you want improvements to this README or an example .pbix file added, open an issue or contact the repo owner.


Generated README — quick start for exploring netflix_titles.csv in Power BI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published