final-project-team-python-byte

M.S. Data Science at Indiana University | Python I590 | Final Project | Team Python Byte

Phase 1

Name 1: Sachin Sharma

Project setup on GITHUB with all required folder structure
Added Folder structure with Code and Data
Added 'BreastCancerWisconsin.csv' file in data folder
Added main file 'FinalAssignment-Phase1.ipynb' in code folder
Implemented solution for phase 1 assignment as per instruction and added markup
- Data cleaning - report Number of NaN, replace ?, impute NaN by the column mean
- Data Stats
- Plotting Scatterplot and Bar Plot must have titles,y axis and x axis names, non-default colors. 9 histograms (subfigures) do not have to have titles
- Import libraries, Proper data import
- Github set up with one folder for code and one folder for dataset
- Readme file with contributions
Merged my branch 'final-project-phase-1-sacshar' to master and verify that code is running successffully

Name 2: Leonardo

Reviewed the code and completed each step of phase I assignment.
Made the following changes:

• Changed single histogram to have one histogram for each variable.

• Changed chart labels/title

• Changed wording in markdown cells

• Formatted scatter plots.

• Changed bar plot colors and added titles for cancer types

• Rounded .describe() summary statistic results

• Changed scatter plot colors

• Added correlation analysis

• Added standard deviation analysis • Added variance analysis • Added mode analysis • Added final summarization of data narrative
Pushed desktop branch to GitHub branch.
Review done and merged to master.
Submitted assignment in Canvas.

Name 3: Mario

Reviewed Code and made sure the same calculations could be done in PyCharm as were done in Jupyter.
- Made sure all calculations are done and consistent throughout the entire project. For example the mean was calculated and filled into the correct NAN fields.
Made sure the code would work on a variety of environments
Reviewed if different sets of code arrived at the same conclusion (example handling a task in pandas two different ways) QA process steps. Made sure each bullet point was completed from the list of tasks assigned in the Assignment Details. Below are the specifics:
- Completed: Replace ? by NaN in column A7. Use ____.replace('?', np.NaN) - but properly specify A7 column. (Completed Under Markdown 'Replace ? by NaN in column A7'
- Completed: After replacing - your column needs to be converted back to numeric. Apply pandas function pd.to_numeric() for column A7 - Shown through printing the datatype as an object, then shown through changing through to_numeric function
- Completed: Report how many NaN. Use isnull() function applied to the dataframe. Then you can use arithmetic sum(): Shown after Markdown 'Check Null Values and Total Count'
- Completed: Replace NaN values with the mean of column A7. Use fillna() - Shown after 'Replace NaN values with the mean of column A7'
- Completed: Provide the summary statistics - Shown after Markdown 'Provide the summary statistics'
- Completed: Find number of columns and number of rows - Shown after Markdown 'Find number of columns and number of rows' using assignment of first value to rows and second value to columns. Shown after Markdown 'Report how many unique id values (column Scn)' Which uses the unique function to return a list and len to count the list.
- Completed: Report how many unique id values (column Scn) - hint the length of unique ids
- Completed: Draw histograms for columns A2-A10. Note: you need to subset your dataframe - slice only columns A2-A10. Use histogram function, add a color of your choice. Note you need to run hist() function on your dataframe with selected columns only. It will output all 9 columns as subplots. Here do not worry about individual titles, y and x axis. You could adjust bins and alpha (opacity) on your histograms

Phase 2

Name 1: Sachin Sharma

Added file 'FinalAssignment-Phase2.ipynb' in code folder
Implemented solution for phase 2 assignment as per instructions and added markup
- Use KMeans algorithm (do not use column CLASS)
- Find the optimal number of clusters
- Revise data variation
- Implement normalization
Updated & Merged my branch to master and verify that code is running successffully
Reviewed the code and make it to the closure

Name 2: Leonardo

Reviewed the code and completed each step of phase II assignment.
Added the scatter matrix.
Increased the size of the inertia plot.
Added the Optimal Number of Clusters Analysis.
Rounded the standard deviation values.
Created the plot showing all standard deviation values.
Created a different box plot that included all variable in one plot.
Added the Data Variation Analysis.
Pushed desktop branch to GitHub branch.
Review completed and merged to master.

Name 2: Mario Angelier

Reviewed code. Made sure each item was complete based on phase II requirements2.
Added shape of centoid array. Allows users to see the shape in addition to the details
Added Context to which features have the most variability
Added Context on the ideal number of clusers from KMeans on this dataset
Made sure we followed requirements in order as they were requested
Pushed local branch
Team merged into final master branch Extra Note - Seems like on this phase we had done duplicated work however in both cases individually the team arrived at the same result

Phase 3

Name 1: Sachin Sharma

Reviewed the complete code as per instructions.
Verified the final code to closure
Filter and Merged all final work in phase 3 file.
Merged final code to master branch.

Name 2: Mario Angelier

Created Dataframe to import into KMeans model
Created labels based on new cluster parameters
Merged labels into original dataframe
Created the Error Rate function
Printed out each Error Rate based on the return of the function.
Pushed local Phase 3 branch to the master

Name 3: Leonardo

Reviewed the code and completed each step of phase III assignment.
Adjusted code to remove "SCN" column and keep "Class"
Removed unnecessary analysis
Added the density plot for analysis.
Added the Report Statement.
Pushed desktop branch to GitHub branch.
Review completed and merged to master.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
code		code
data		data
Final Project Phase 1-Instruction		Final Project Phase 1-Instruction
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

final-project-team-python-byte

Phase 1

Phase 2

Phase 3

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

iu-data-science-python-i590-Group1/final-project-team-python-byte

Folders and files

Latest commit

History

Repository files navigation

final-project-team-python-byte

Phase 1

Phase 2

Phase 3

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages