This repository contains a single dataset file, netflix_titles.csv, intended for exploration and reporting in Power BI Desktop. The README explains the dataset, assumptions, quick import steps for Power BI, suggested analyses and visuals, and credits/licenses.
- File:
netflix_titles.csv(located at repository root) - Purpose: Exploratory analysis, dashboards and reporting in Power BI using Netflix titles metadata (movies & TV shows).
I inspected the netflix_titles.csv file in this repo and verified the header and sample rows. The CSV uses the common Netflix titles schema and includes these columns (exact names):
show_id— unique id per title (e.g., s1, s2)type— "Movie" or "TV Show"title— title namedirector— director(s) (may be blank)cast— main cast (comma-separated list; may be blank)country— production country or countries (comma-separated)date_added— date the title was added to Netflix (string like "September 24, 2021")release_year— year of release (integer)rating— content rating (e.g., PG-13, TV-MA)duration— runtime or season info (examples: "90 min", "1 Season", "2 Seasons", "9 Seasons", "13 min")listed_in— comma-separated genres / categoriesdescription— short text description
Sample notes from the file:
date_addedvalues are in the format "MonthName DD, YYYY" (e.g., "September 24, 2021"). Use Power Query to parse this to Date.durationmixes minutes and season counts. Movies usually show minutes ("90 min"); TV shows show season counts ("1 Season", "2 Seasons"). Some short-form items show minutes ("13 min").cast,country, andlisted_inare multi-valued comma-separated strings requiring splitting/unpivoting if you want one row per cast member/genre.
If your CSV version has extra columns or different names, adapt mapping during import. The rest of this README uses these exact column names as present in the file included in the repo.
- Open Power BI Desktop.
- Choose "Get data" → "Text/CSV" and select
netflix_titles.csv. - In the preview dialog, click "Transform Data" to open Power Query for cleaning.
- Recommended Transform steps in Power Query:
- Convert
date_addedto Date (Split or parse if it contains month names). - Clean
duration: split into numeric duration and unit. For movies, strip " min" and convert to number; for TV shows you may keep the season count separate. - Split
listed_inandcastinto lists if you plan to analyze by individual genre/cast member (use "Split Column" by delimiter and/or unpivot techniques). - Trim whitespace and replace missing/nulls for consistent filtering.
- Convert
Here are a couple of quick Power Query / M transformations you can apply in the Query Editor.
-
Parse
date_addedto Date (handle blank values):-
In Power Query Advanced Editor you can add a step like:
let Source = Csv.Document(File.Contents("netflix_titles.csv"),[Delimiter=",", Columns=12, Encoding=65001, QuoteStyle=QuoteStyle.Csv]), Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), Parsed = Table.AddColumn(Promoted, "date_added_dt", each try Date.FromText(Text.Trim([date_added])) otherwise try Date.FromText(Date.ToText(Date.From(Text.Trim([date_added])))) otherwise null) in Parsed
-
Simpler: select
date_addedcolumn → Transform → Using Locale → Date → English (United States) to parse month names.
-
-
Extract movie runtime in minutes into a numeric column
duration_minutes:-
Add Column → Custom Column with formula:
= if Text.Contains([duration], "min") then Number.From(Text.Select([duration], {"0".."9"})) else null
-
Or, in M:
Table.AddColumn(Promoted, "duration_minutes", each try Number.From(Text.Select([duration],{"0".."9"})) otherwise null)
-
-
Split multi-valued columns (
listed_in,cast,country) for per-item analysis:- Use Split Column by Delimiter (comma) → then Transform each to trimmed text.
- Or to create one row per genre/actor: Split into rows (Home → Split Column → By Delimiter → Advanced options → Split into Rows).
- Titles over time: bar/line chart of count of titles by
release_yearor bydate_added(month/year). - Titles by type: donut or stacked bar showing Movies vs TV Shows.
- Top genres: split
listed_inand show top N genres by title count. - Country breakdown: map or horizontal bar chart for top producing countries.
- Ratings distribution: stacked bar of
ratingbytypeor by year. - Duration analysis: histogram of movie runtimes (after extracting minutes).
- Cast / Director network: create tables listing top directors and actors by title count.
Example DAX measures (adjust column/table names if different):
Example DAX measures (adjust table name to the one you use when loading the CSV, e.g. netflix_titles):
- Total Titles = COUNTROWS('netflix_titles')
- Movies Count = CALCULATE(COUNTROWS('netflix_titles'), 'netflix_titles'[type] = "Movie")
- TV Shows Count = CALCULATE(COUNTROWS('netflix_titles'), 'netflix_titles'[type] = "TV Show")
- Titles Added This Year = CALCULATE(COUNTROWS('netflix_titles'), YEAR('netflix_titles'[date_added_dt]) = YEAR(TODAY()))
- Average Movie Runtime = AVERAGEX(FILTER('netflix_titles', NOT(ISBLANK('netflix_titles'[duration_minutes]))), 'netflix_titles'[duration_minutes])
These measures assume you created a Date/DateTime column date_added_dt and numeric duration_minutes in Power Query.
Tip: If duration contains text like "90 min", create a Power Query column duration_minutes by removing non-numeric characters and converting to Whole Number. For TV shows you can extract season counts separately.
- Inputs:
netflix_titles.csvin repository root - Output: Power BI report visualizing title counts, genre breakdown, and duration distribution
- Error modes: missing columns (adjust mapping), unparseable dates/durations (use Power Query transformations)
- Missing or blank
date_addedordurationvalues - Multiple countries/genres in one cell (need to split/unpivot for per-item counts)
- Inconsistent
durationformats between movies and TV shows
This README references the widely shared "Netflix Movies and TV Shows" dataset commonly used in public tutorials (for example, copies hosted on Kaggle and other public data repositories). If your copy of the CSV has a different origin, update this credit accordingly.
Usage: This repository is intended for educational, exploration and reporting purposes only. The dataset contains metadata originally published by Netflix and/or aggregated by third parties — you must follow the original dataset's license and Netflix's terms of use when redistributing data or derivative works. If you are publishing dashboards or derivatives externally, check the dataset source for any attribution or redistribution restrictions.
Suggested license note for this repo (not a legal statement):
"This repository contains an example dataset for analysis and a set of instructions for working with it in Power BI. The dataset file netflix_titles.csv may be redistributed only under the terms permitted by its original publisher. The README and example transforms are published under the MIT License — use at your own risk."
If you'd like, I can add a LICENSE file (MIT) to the repo; tell me if you want that.
- Build a sample Power BI report with the visuals above and include a
.pbixsample file in the repo (optional). - Add a small notebook or script (Python/R) that performs basic cleaning (extract duration minutes, parse dates) to show reproducible ETL.
If you want improvements to this README or an example .pbix file added, open an issue or contact the repo owner.
Generated README — quick start for exploring netflix_titles.csv in Power BI.