Assignment_02_Sampling

Done by Vivek Arora

Sampling is a technique used to derive insights about a population by analyzing statistics from a representative subset, eliminating the need to examine every individual. To address an initial dataset imbalance—763 non-fraudulent cases and only 9 fraudulent cases—an oversampling approach was applied. This involved generating additional instances of the minority class (fraudulent cases) until it matched the majority class (non-fraudulent cases), resulting in a balanced dataset consolidated into a single data frame.

The following sampling methods were employed:

Simple Random Sampling: Selecting samples randomly from the population.
Systematic Sampling: Choosing samples at regular intervals after a random starting point.
Cluster Sampling: Randomly selecting entire clusters from the population.
Stratified Sampling: Dividing the population into subgroups based on specific criteria.
Bootstrap Sampling: Resampling with replacement to generate multiple samples from the original dataset.

Following the generation of five distinct samples using these techniques, five models were applied to each sample. The accuracies of each model for a given sample are summarized in the following table:

Sample Technique	Random Forest	Logistic Regression	Naive Bayes	Decision Trees	KNN
Simple Random Sampling	0.9870	0.8831	0.7013	0.9610	0.8701
Systematic Sampling	1.0000	0.8926	0.7450	1.0000	0.9329
Cluster Sampling	1.0000	0.9670	1.0000	1.0000	0.9890
Stratified Sampling	1.0000	0.9030	0.7239	0.9925	0.9552
Bootstrap Sampling	1.0000	0.9250	0.7500	0.9625	0.9375

In above table, each row corresponds to a sampling technique, and each column represent the accuracy achieved by each model applied to the respective sample generated using that respective technique.

The RANDOM FOREST outperformed all other models when applied to Stratified Sampling Technique.

Done by : Vivek Arora
Roll_No : 102203778
Group : 3C42

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
102203778_Vivek_Sampling.ipynb		102203778_Vivek_Sampling.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assignment_02_Sampling

Done by Vivek Arora

The RANDOM FOREST outperformed all other models when applied to Stratified Sampling Technique.

About

Uh oh!

Releases

Packages

Languages

gitvivek14/Assignment_02_Sampling

Folders and files

Latest commit

History

Repository files navigation

Assignment_02_Sampling

Done by Vivek Arora

The RANDOM FOREST outperformed all other models when applied to Stratified Sampling Technique.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages