In [1]:
#Pandas 中最核心、最强大的技能：数据选择与筛选。
#适用于几乎所有的数据分析场景！ 数据筛选是数据分析工作流中最频繁的操作，没有之一。它是后续所有分析（统计、聚合、可视化）的前提。

#市场营销分析：

#“帮我筛选出所有在过去30天内消费超过500元，并且居住在北京的客户。”

#“找出所有购买过A产品，但没有购买过B产品的用户，我们要对他们进行精准营销。”

#金融风控：

#“筛选出所有在深夜（凌晨1-4点）发生，且金额大于1万元的交易记录。”

#“找出所有在过去一周内，信用卡申请被拒绝超过3次的申请人。”

#运营分析：

#“筛选出所有用户反馈中，包含‘卡顿’、‘闪退’、‘无法登录’这些关键词的评论。”

#“找出所有在注册后7天内，一次都没有登录过的用户，分析他们的流失原因。”

In [2]:
import pandas as pd
df = pd.read_csv("netflix_titles.csv")

# 单一条件筛选 (最基础)
# 语法直观：df[df['列名']条件]
# 筛选条件是：'type' 这一列的值 等于(==) 'Movie'
movies_df = df[df['type'] == 'Movie']

# 打印一下电影的数量，和筛选后的前5行看看
print(f"数据集中电影的总数: {len(movies_df)}")
print("\n筛选出的电影数据前5行预览:")
movies_df.head()

数据集中电影的总数: 6131

筛选出的电影数据前5行预览:


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...


In [3]:
#多条件筛选 (进阶) 
#在真实工作中，我们往往需要同时满足多个条件。
#语法关键：
#用 & 代表“并且” (AND)。
#用 | 代表“或者” (OR)。
#每个条件都必须用小括号 () 包起来。

#任务：为我们的报告筛选出“2020年以后上映的美国剧集”。

Y2020_USA_Movies_df = df[
    (df['type']=='TV Show')&
    (df['release_year']>2020)&
    (df['country']=='United States')
]
#注意每一个df都要写出来 比如如果直接['release_year']>2020,会直接把release_year当成字符串和int比较

print(f'2020年后上映的美国电影的数量为{len(Y2020_USA_Movies_df)}部\n')

Y2020_USA_Movies_df.head()

#Jupyter的"最后一行"黄金显示法则：
#把所有需要 print 的信息放在前面，把我们最想“美美地”看一眼的 DataFrame 或者图表，放在代码单元格的最后一行。
#它发现这是最后一行了，并且这一行产生了一个“结果”（一个 DataFrame 表格）。
#于是，Jupyter 触发了它的“美化”功能，把这个结果用漂亮的 HTML 格式展示了出来。

#print()是命令 而head()产生结果，放在前面如果后面还有print ，则在Jupyter中不予显示

2020年后上映的美国电影的数量为89部



Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
15,s16,TV Show,Dear White People,,"Logan Browning, Brandon P. Bell, DeRon Horton,...",United States,"September 22, 2021",2021,TV-MA,4 Seasons,"TV Comedies, TV Dramas",Students of color navigate the daily slights a...
40,s41,TV Show,He-Man and the Masters of the Universe,,"Yuri Lowenthal, Kimberly Brooks, Antony Del Ri...",United States,"September 16, 2021",2021,TV-Y7,1 Season,"Kids' TV, TV Sci-Fi & Fantasy",Mighty teen Adam and his heroic squad of misfi...
55,s56,TV Show,Nailed It,,"Nicole Byer, Jacques Torres",United States,"September 15, 2021",2021,TV-PG,6 Seasons,Reality TV,Home bakers with a terrible track record take ...
82,s83,TV Show,Lucifer,,"Tom Ellis, Lauren German, Kevin Alejandro, D.B...",United States,"September 10, 2021",2021,TV-14,6 Seasons,"Crime TV Shows, TV Comedies, TV Dramas","Bored with being the Lord of Hell, the devil r..."
97,s98,TV Show,Kid Cosmic,,"Jack Fisher, Tom Kenny, Amanda C. Miller, Kim ...",United States,"September 7, 2021",2021,TV-Y7,2 Seasons,"Kids' TV, TV Comedies, TV Sci-Fi & Fantasy",A boy's superhero dreams come true when he fin...


In [7]:
#字符串模糊筛选 (高级)

#有时候我们想筛选的不是精确值，而是一个包含关系。

#任务：我们的报告可能需要分析“爱情”主题的内容，请筛选出所有标题中包含 "Love" 的作品。
# .str 允许我们对整列进行字符串操作
# .contains('Love') 就是检查每个单元格是否包含 "Love" 这个词
# na=False 的意思是如果单元格是空的(NaN)，就当它不包含，避免报错
# case=False 的意思是忽略大小写，这样 "love" 和 "Love" 都能被找到

love_shows = df[df['title'].str.contains('love',na=False,case=False)]
#当你加上 .str 后，你等于是在告诉 Pandas：
#“嘿，Pandas，请你使用你的字符串处理工具箱 (.str)，
#然后把工具箱里的 contains() 这个工具，应用到 df['title'] 这一整列的每一个元素上，
#最后把每个元素的结果（是True还是False）按原来的顺序还给我。”

print(f"\n标题中包含'Love'的作品数量: {len(love_shows)}")
love_shows.head()


标题中包含'Love'的作品数量: 196


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
25,s26,TV Show,Love on the Spectrum,,Brooke Satchwell,Australia,"September 21, 2021",2021,TV-14,2 Seasons,"Docuseries, International TV Shows, Reality TV",Finding love can be hard for anyone. For young...
158,s159,Movie,Love Don't Cost a Thing,Troy Byer,"Nick Cannon, Christina Milian, Kenan Thompson,...",United States,"September 1, 2021",2003,PG-13,101 min,"Comedies, Romantic Movies",A nerdy teen tries to make himself cool by ass...
159,s160,Movie,Love in a Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,"September 1, 2021",2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...
206,s207,Movie,"LSD: Love, Sex Aur Dhokha",Dibakar Banerjee,"Nushrat Bharucha, Anshuman Jha, Neha Chauhan, ...",India,"August 27, 2021",2010,TV-MA,112 min,"Dramas, Independent Movies, International Movies",This provocative drama examines how the voyeur...
227,s228,Movie,Really Love,Angel Kristi Williams,"Kofi Siriboe, Yootha Wong-Loi-Sing, Michael Ea...",United States,"August 25, 2021",2020,TV-MA,95 min,"Dramas, Independent Movies, Romantic Movies",A rising Black painter tries to break into a c...


In [6]:
#注意 单独的df筛选某列显示的就是一个series，有一个索引和一个值组成，相当于一个一维数组
df['title'].head()  

0     Dick Johnson Is Dead
1            Blood & Water
2                Ganglands
3    Jailbirds New Orleans
4             Kota Factory
Name: title, dtype: object

In [None]:
#三个任务

#1.数据集中，日本 (Japan) 的电影 (Movie) 有多少部？

#2.筛选出所有由 大卫·芬奇 (David Fincher) 导演的电影。（提示：导演在 director 这一列）

#3.有多少作品的评级是 TV-14，并且是在加拿大 (Canada) 上映的？（提示：评级在 rating 这一列）

In [9]:
#1 数据集中，日本 (Japan) 的电影 (Movie) 有多少部？

JM = df[(df['country']=='Japan')&(df['type']=='Movie')]

print(f'日本电影有{len(JM)}部')

日本电影有76部


In [10]:
#2 筛选出所有由 大卫·芬奇 (David Fincher) 导演的电影。（提示：导演在 director 这一列）

M_David = df[df['director']=='David Fincher']

M_David.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
600,s601,Movie,The Game,David Fincher,"Michael Douglas, Sean Penn, Deborah Kara Unger...",United States,"July 1, 2021",1997,R,129 min,Thrillers,An aloof investment banker's life spirals into...
1595,s1596,Movie,MANK,David Fincher,"Gary Oldman, Amanda Seyfried, Charles Dance, L...",United States,"December 4, 2020",2020,R,133 min,"Dramas, Independent Movies",1930s Hollywood is reevaluated through the eye...
7701,s7702,Movie,Panic Room,David Fincher,"Jodie Foster, Forest Whitaker, Dwight Yoakam, ...",United States,"August 1, 2019",2002,R,112 min,Thrillers,A woman and her daughter are caught in a game ...
8320,s8321,Movie,The Girl with the Dragon Tattoo,David Fincher,"Daniel Craig, Rooney Mara, Christopher Plummer...","United States, Sweden, Norway","January 5, 2021",2011,R,158 min,"Dramas, Thrillers",When a young computer hacker is tasked with in...
8511,s8512,Movie,The Social Network,David Fincher,"Jesse Eisenberg, Andrew Garfield, Justin Timbe...",United States,"April 1, 2020",2010,PG-13,121 min,Dramas,Director David Fincher's biographical drama ch...


In [12]:
#3 有多少作品的评级是 TV-14，并且是在加拿大 (Canada) 上映的？（提示：评级在 rating 这一列）

Special = df[(df['rating']=='TV-14')&(df['country']=='Canada')]

print(f'有{len(Special)}部作品的评级是 TV-14，并且是在加拿大 (Canada) 上映的。')

有26部作品的评级是 TV-14，并且是在加拿大 (Canada) 上映的。
