### Prompt

You have a dataset that provides details of the winners of a lottery from November 3, 1996, to October 27, 2018. In this lottery, players select 6 numbers, and the objective is to match all 6 numbers to win. 

Your tasks:

1. During this period, how many times was there just one winner (matched 6 out of 6 numbers) who claimed the entire accumulated jackpot?
2. Identify the date and the highest jackpot amount ever won by a single winner who matched 6 out of 6 numbers.

In [1]:
# import lib

import pandas as pd
import os

In [3]:
# importing data

cwd = os.getcwd()

file_path = os.path.join(cwd, 'data.csv')

df = pd.read_csv(file_path)

df.head()

Unnamed: 0,Draw ID,Draw Date,Winning number1,Winning number2,Winning number3,Winning number4,Winning number5,Winning number6,Matches 6/6,Prize won by each player (match 6/6),Matches 5/6,Prize won by each player (match 5/6),Matches 4/6,Prize won by each player (match 4/6),Accumulated jackpot,Total Accumulated
0,1,11/03/1996,41,5,4,52,30,33,0,0,17,"39.158,92",2016,33021,YES,"1.714.650,23"
1,2,18/03/1996,9,39,37,49,43,41,1,"2.307.162,23",65,"14.424,02",4488,20891,NO,0
2,3,25/03/1996,36,30,10,11,29,47,2,"391.192,51",62,"10.515,93",4261,15301,NO,0
3,4,01/04/1996,6,59,42,27,1,5,0,0,39,"15.322,24",3311,18048,YES,"717.080,75"
4,5,08/04/1996,1,19,46,6,16,2,0,0,98,"5.318,10",5399,9653,YES,"1.342.488,85"


In [6]:
# replacing period with '' in match 6/6 col

df['Prize won by each player (match 6/6)'] = df['Prize won by each player (match 6/6)'].str.replace('.', '')

df.head()

Unnamed: 0,Draw ID,Draw Date,Winning number1,Winning number2,Winning number3,Winning number4,Winning number5,Winning number6,Matches 6/6,Prize won by each player (match 6/6),Matches 5/6,Prize won by each player (match 5/6),Matches 4/6,Prize won by each player (match 4/6),Accumulated jackpot,Total Accumulated
0,1,11/03/1996,41,5,4,52,30,33,0,0,17,"39.158,92",2016,33021,YES,"1.714.650,23"
1,2,18/03/1996,9,39,37,49,43,41,1,230716223,65,"14.424,02",4488,20891,NO,0
2,3,25/03/1996,36,30,10,11,29,47,2,39119251,62,"10.515,93",4261,15301,NO,0
3,4,01/04/1996,6,59,42,27,1,5,0,0,39,"15.322,24",3311,18048,YES,"717.080,75"
4,5,08/04/1996,1,19,46,6,16,2,0,0,98,"5.318,10",5399,9653,YES,"1.342.488,85"


In [7]:
# replacing comma with period

df['Prize won by each player (match 6/6)'] = df['Prize won by each player (match 6/6)'].str.replace(',', '.')

df.head()

Unnamed: 0,Draw ID,Draw Date,Winning number1,Winning number2,Winning number3,Winning number4,Winning number5,Winning number6,Matches 6/6,Prize won by each player (match 6/6),Matches 5/6,Prize won by each player (match 5/6),Matches 4/6,Prize won by each player (match 4/6),Accumulated jackpot,Total Accumulated
0,1,11/03/1996,41,5,4,52,30,33,0,0.0,17,"39.158,92",2016,33021,YES,"1.714.650,23"
1,2,18/03/1996,9,39,37,49,43,41,1,2307162.23,65,"14.424,02",4488,20891,NO,0
2,3,25/03/1996,36,30,10,11,29,47,2,391192.51,62,"10.515,93",4261,15301,NO,0
3,4,01/04/1996,6,59,42,27,1,5,0,0.0,39,"15.322,24",3311,18048,YES,"717.080,75"
4,5,08/04/1996,1,19,46,6,16,2,0,0.0,98,"5.318,10",5399,9653,YES,"1.342.488,85"


In [9]:
# converting col to numeric

df['Prize won by each player (match 6/6)'] = df['Prize won by each player (match 6/6)'].astype('float64')

df['Prize won by each player (match 6/6)'].apply(type)

0       <class 'float'>
1       <class 'float'>
2       <class 'float'>
3       <class 'float'>
4       <class 'float'>
             ...       
2087    <class 'float'>
2088    <class 'float'>
2089    <class 'float'>
2090    <class 'float'>
2091    <class 'float'>
Name: Prize won by each player (match 6/6), Length: 2092, dtype: object

In [11]:
# filtering to where matches 6/6 == 1

df_filtered = df[df['Matches 6/6'] == 1]

df_filtered.head()

Unnamed: 0,Draw ID,Draw Date,Winning number1,Winning number2,Winning number3,Winning number4,Winning number5,Winning number6,Matches 6/6,Prize won by each player (match 6/6),Matches 5/6,Prize won by each player (match 5/6),Matches 4/6,Prize won by each player (match 4/6),Accumulated jackpot,Total Accumulated
1,2,18/03/1996,9,39,37,49,43,41,1,2307162.23,65,"14.424,02",4488,20891,NO,0
10,11,20/05/1996,25,15,58,37,59,38,1,15591365.07,148,"12.706,05",9442,19916,NO,0
16,17,01/07/1996,10,20,6,19,51,13,1,6789869.08,144,"7.628,37",9376,11716,NO,0
23,24,19/08/1996,1,8,14,28,33,43,1,18661679.61,227,"7.897,31",13486,13293,NO,0
28,29,22/09/1996,14,56,58,8,43,3,1,5401793.6,44,"23.083,15",5248,19353,NO,0


### Task 1

In [13]:
# counting

one_winner_count = df_filtered.shape[0]

print(f"number of sole winners: {one_winner_count}")

number of sole winners: 364


### Task 2

In [14]:
# finding biggest prize winner

df_filtered.nlargest(1, 'Prize won by each player (match 6/6)')

Unnamed: 0,Draw ID,Draw Date,Winning number1,Winning number2,Winning number3,Winning number4,Winning number5,Winning number6,Matches 6/6,Prize won by each player (match 6/6),Matches 5/6,Prize won by each player (match 5/6),Matches 4/6,Prize won by each player (match 4/6),Accumulated jackpot,Total Accumulated
1763,1764,25/11/2015,6,7,41,39,29,55,1,205329800.0,401,"58.622,54",33850,99209,NO,0
