# NumPy (Numerical Python)
<p>NumPy (Numerical Python) is the fundamental, open-source library for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with a vast collection of high-level mathematical functions to operate on these data structures.</p> 
---

## Key Features and Concepts
- N-dimensional Array (ndarray): The core of NumPy is the ndarray object, which is a fast and memory-efficient data structure for homogeneous data (all elements are of the same type). Unlike Python's built-in lists, which can hold elements of different types and are slower for numerical operations, NumPy arrays store data in contiguous memory blocks, enhancing performance.

- Vectorization: NumPy operations are "vectorized," meaning they apply to the entire array at once without requiring explicit Python for loops. This allows for concise code that resembles standard mathematical notation and is significantly faster, as the core functionality is written in C and C++.

- Broadcasting: This powerful mechanism allows NumPy to perform arithmetic operations on arrays of different shapes and sizes, provided their dimensions are compatible.

- Mathematical Functions: NumPy offers a comprehensive suite of functions for linear algebra, Fourier transforms, matrix manipulations, statistical operations (like mean, median, and standard deviation), and random number generation.

- Ecosystem Integration: NumPy serves as the foundational package for many other major Python data science and machine learning libraries, including Pandas, SciPy, and scikit-learn. 

In [3]:
import pandas as pd
import numpy as np

In [4]:
df = pd.DataFrame({'Name': ['prachi', 'chaitnya', 'gunjan', 'ahnis', 'vani', 'muskan', 'mitanshi'], 'Age': [22, 22, 22, 23, np.nan, 21, 23], 'Class': ['Btech', 'BCA', 'Mtech', np.nan, 'BCA', 'BCA', 'Mtech']})
df

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,
4,vani,,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [5]:
df.isnull()

Unnamed: 0,Name,Age,Class
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,True
4,False,True,False
5,False,False,False
6,False,False,False


In [6]:
df.to_excel('data.xlsx')

In [7]:
df.isnull().any()

Name     False
Age       True
Class     True
dtype: bool

In [8]:
df.isnull().sum()

Name     0
Age      1
Class    1
dtype: int64

In [9]:
df.isna().sum()

Name     0
Age      1
Class    1
dtype: int64

In [10]:
df

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,
4,vani,,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [11]:
df.dropna()

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [12]:
df.dropna(axis=1)

Unnamed: 0,Name
0,prachi
1,chaitnya
2,gunjan
3,ahnis
4,vani
5,muskan
6,mitanshi


In [13]:
df.dropna(axis=0)

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [14]:
df.rename(columns={'Age': 'age'})

Unnamed: 0,Name,age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,
4,vani,,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [15]:
df.Age.mean()

22.166666666666668

In [16]:
df.fillna(df.Age.mean())

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,22.166667
4,vani,22.166667,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [17]:
df.Age.fillna(df.Age.mean(), inplace=True)

In [18]:
df

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,
4,vani,22.166667,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [19]:
df.describe(include='object').Class.top

'BCA'

In [20]:
df.Class.fillna(df.describe(include='object').Class.top, inplace=True)

In [21]:
df

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,BCA
4,vani,22.166667,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [22]:
athletes = pd.read_csv('olympics_athletes_dataset.csv')

In [23]:
athletes.head()

Unnamed: 0,athlete_id,athlete_name,gender,age,date_of_birth,nationality,country_name,sport,event,games_type,...,bronze_medals,country_total_gold,country_total_medals,country_first_participation,country_best_rank,is_record_holder,coach_name,height_cm,weight_kg,notes
0,ATH-00001,Svetlana Jung,Female,19,2005-12-04,AUT,Austria,Rowing,Four W,Summer,...,1,59,196,1896,18,No,Wei Ping,175.9,73.7,-
1,ATH-00002,Mary Yamamoto,Female,37,1987-07-11,MEX,Mexico,Ski Jumping,Normal Hill Team,Winter,...,5,14,72,1924,35,No,Yury Zakharevich,165.4,68.3,Olympic Debut
2,ATH-00003,Oksana Volkov,Female,37,1987-02-02,BUL,Bulgaria,Figure Skating,Women's Singles,Winter,...,1,54,224,1896,15,No,Alberto Salazar,164.2,67.2,-
3,ATH-00004,Rui Suzuki,Male,32,1992-12-08,HKG,Hong Kong,Triathlon,Men's Triathlon,Summer,...,0,3,9,1952,60,Olympic Record,Marcus O'Sullivan,190.0,76.0,First from country
4,ATH-00005,Natalya Grigoryan,Female,27,1997-11-15,SWE,Sweden,Triathlon,Men's Triathlon,Summer,...,0,200,648,1896,4,No,John Smith,175.8,60.9,Season Best


In [24]:
a = pd.read_excel('data.xlsx')

In [25]:
a

Unnamed: 0.1,Unnamed: 0,Name,Age,Class
0,0,prachi,22.0,Btech
1,1,chaitnya,22.0,BCA
2,2,gunjan,22.0,Mtech
3,3,ahnis,23.0,
4,4,vani,,BCA
5,5,muskan,21.0,BCA
6,6,mitanshi,23.0,Mtech


In [26]:
a.drop('Unnamed: 0', axis=1, inplace=True)

In [27]:
a

Unnamed: 0,Name,Age,Class
0,prachi,22.0,Btech
1,chaitnya,22.0,BCA
2,gunjan,22.0,Mtech
3,ahnis,23.0,
4,vani,,BCA
5,muskan,21.0,BCA
6,mitanshi,23.0,Mtech


In [28]:
athletes.head(1)

Unnamed: 0,athlete_id,athlete_name,gender,age,date_of_birth,nationality,country_name,sport,event,games_type,...,bronze_medals,country_total_gold,country_total_medals,country_first_participation,country_best_rank,is_record_holder,coach_name,height_cm,weight_kg,notes
0,ATH-00001,Svetlana Jung,Female,19,2005-12-04,AUT,Austria,Rowing,Four W,Summer,...,1,59,196,1896,18,No,Wei Ping,175.9,73.7,-


In [29]:
athletes.columns

Index(['athlete_id', 'athlete_name', 'gender', 'age', 'date_of_birth',
       'nationality', 'country_name', 'sport', 'event', 'games_type', 'year',
       'host_city', 'team_or_individual', 'medal', 'result_value',
       'result_unit', 'total_olympics_attended', 'total_medals_won',
       'gold_medals', 'silver_medals', 'bronze_medals', 'country_total_gold',
       'country_total_medals', 'country_first_participation',
       'country_best_rank', 'is_record_holder', 'coach_name', 'height_cm',
       'weight_kg', 'notes'],
      dtype='object')

In [30]:
athletes.shape

(8500, 30)

In [31]:
athletes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8500 entries, 0 to 8499
Data columns (total 30 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   athlete_id                   8500 non-null   object 
 1   athlete_name                 8500 non-null   object 
 2   gender                       8500 non-null   object 
 3   age                          8500 non-null   int64  
 4   date_of_birth                8500 non-null   object 
 5   nationality                  8500 non-null   object 
 6   country_name                 8500 non-null   object 
 7   sport                        8500 non-null   object 
 8   event                        8500 non-null   object 
 9   games_type                   8500 non-null   object 
 10  year                         8500 non-null   int64  
 11  host_city                    8500 non-null   object 
 12  team_or_individual           8500 non-null   object 
 13  medal             

In [32]:
athletes.isnull().sum()

athlete_id                     0
athlete_name                   0
gender                         0
age                            0
date_of_birth                  0
nationality                    0
country_name                   0
sport                          0
event                          0
games_type                     0
year                           0
host_city                      0
team_or_individual             0
medal                          0
result_value                   0
result_unit                    0
total_olympics_attended        0
total_medals_won               0
gold_medals                    0
silver_medals                  0
bronze_medals                  0
country_total_gold             0
country_total_medals           0
country_first_participation    0
country_best_rank              0
is_record_holder               0
coach_name                     0
height_cm                      0
weight_kg                      0
notes                          0
dtype: int

In [33]:
athletes.drop('date_of_birth', axis=1, inplace=True)

In [34]:
athletes

Unnamed: 0,athlete_id,athlete_name,gender,age,nationality,country_name,sport,event,games_type,year,...,bronze_medals,country_total_gold,country_total_medals,country_first_participation,country_best_rank,is_record_holder,coach_name,height_cm,weight_kg,notes
0,ATH-00001,Svetlana Jung,Female,19,AUT,Austria,Rowing,Four W,Summer,1896,...,1,59,196,1896,18,No,Wei Ping,175.9,73.7,-
1,ATH-00002,Mary Yamamoto,Female,37,MEX,Mexico,Ski Jumping,Normal Hill Team,Winter,1960,...,5,14,72,1924,35,No,Yury Zakharevich,165.4,68.3,Olympic Debut
2,ATH-00003,Oksana Volkov,Female,37,BUL,Bulgaria,Figure Skating,Women's Singles,Winter,1932,...,1,54,224,1896,15,No,Alberto Salazar,164.2,67.2,-
3,ATH-00004,Rui Suzuki,Male,32,HKG,Hong Kong,Triathlon,Men's Triathlon,Summer,2012,...,0,3,9,1952,60,Olympic Record,Marcus O'Sullivan,190.0,76.0,First from country
4,ATH-00005,Natalya Grigoryan,Female,27,SWE,Sweden,Triathlon,Men's Triathlon,Summer,1900,...,0,200,648,1896,4,No,John Smith,175.8,60.9,Season Best
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8495,ATH-08496,Dorothy Meyer,Female,29,KEN,Kenya,Ice Hockey,Women's Ice Hockey,Winter,1994,...,2,38,109,1956,20,No,Renato Canova,187.9,77.4,Disqualified (false start)
8496,ATH-08497,Romain Ito,Male,18,DPR,North Korea,Fencing,Foil Team M,Summer,2024,...,3,16,54,1964,35,No,Vladimir Alekna,191.9,64.9,First from country
8497,ATH-08498,Magdalena Garcia,Female,36,AUS,Australia,Short Track Speed Skating,1500m W,Winter,2002,...,1,170,554,1896,4,No,Andrei Chemerkin,159.6,62.2,Comeback after injury
8498,ATH-08499,Leyla Kwon,Female,29,BEL,Belgium,Shooting,10m Air Pistol M,Summer,2004,...,2,43,158,1900,20,No,Sandor Szabo,184.9,75.9,Personal Best


In [35]:
athletes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8500 entries, 0 to 8499
Data columns (total 29 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   athlete_id                   8500 non-null   object 
 1   athlete_name                 8500 non-null   object 
 2   gender                       8500 non-null   object 
 3   age                          8500 non-null   int64  
 4   nationality                  8500 non-null   object 
 5   country_name                 8500 non-null   object 
 6   sport                        8500 non-null   object 
 7   event                        8500 non-null   object 
 8   games_type                   8500 non-null   object 
 9   year                         8500 non-null   int64  
 10  host_city                    8500 non-null   object 
 11  team_or_individual           8500 non-null   object 
 12  medal                        8500 non-null   object 
 13  result_value      