Preprocess the dataset : Handling NULL values & Duplicate features #1

sagnik1511 · 2021-10-01T08:31:18Z

Problem Statement:

First, run the notebook and preprocess the dataset with given steps :

Steps:

1. Relacing Null values: If null values are present then intelligently handle those.
2. Remove unwanted features or rows: If there are features with low variance, constant feature, then omit those. If there are duplicate data in the dataset omit those too.

The update should reflect in the notebook.

nilupulmanodya · 2021-10-04T09:30:02Z

I would like to contribute in this issue

sagnik1511 · 2021-10-04T09:37:41Z

Assigned @nilupulmanodya.

nilupulmanodya · 2021-10-04T11:23:03Z

Assigned @nilupulmanodya.

Need Clarification :
2. Remove unwanted features or rows: If there are features with low variance, constant feature, then omit those. If there are duplicate data in the dataset omit those too.

Here dataset have 19 data columns and 4 of them are categorical.('artists', 'id' ,'name' and 'release_date').' id ' is already dropped. Other features also has low variance data, but I think those rows need for future processing. Is it need to omit those low variance data?

sagnik1511 · 2021-10-04T11:48:44Z

As the specific numeric features can have some importance when clustering, you can avoid dropping those features. Rather than that if you find any data with almost constant distribution feel free to drop those.

P.S. remember to state the modifications you did in your PR. Happy contributing!

sagnik1511 added the enhancement New feature or request label Oct 1, 2021

niloysikdar added the hacktoberfest Issue is under Hacktoberfest label Oct 2, 2021

sagnik1511 assigned nilupulmanodya Oct 4, 2021

This was referenced Oct 4, 2021

Please check and tell me needed changes #6

Closed

Please check and tell me needed changes #7

Merged

sagnik1511 closed this as completed Oct 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocess the dataset : Handling NULL values & Duplicate features #1

Preprocess the dataset : Handling NULL values & Duplicate features #1

sagnik1511 commented Oct 1, 2021

nilupulmanodya commented Oct 4, 2021

sagnik1511 commented Oct 4, 2021

nilupulmanodya commented Oct 4, 2021

sagnik1511 commented Oct 4, 2021

Preprocess the dataset : Handling NULL values & Duplicate features #1

Preprocess the dataset : Handling NULL values & Duplicate features #1

Comments

sagnik1511 commented Oct 1, 2021

Problem Statement:

Steps:

nilupulmanodya commented Oct 4, 2021

sagnik1511 commented Oct 4, 2021

nilupulmanodya commented Oct 4, 2021

sagnik1511 commented Oct 4, 2021