Skip to content

sidg4/ics434-s24

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICS 434:

1_Introduction.ipynb

This notebook introduces the basics of data science, its origins, and its key components. Data science is an area that brings together methods from statistics and computer science to handle, analyze, and get insights from data. The notebook covers the essential parts of data science: collecting data, preparing and cleaning it, exploring it to spot trends, building models to predict future trends, and visualizing data in clear and informative ways.

2_Intro_to_pandas_Python_package.ipynb

This notebook provides an overview of packages and modules in Python. It explains how packages are structured directories containing Python modules, which are individual Python files, and details how they can be imported and used to organize and reuse code efficiently in Python programming.

3_intro_to_pandas.ipynb

This notebook introduces Pandas, the leading library for data wrangling. Specifically, the notebook introduces two pivotal data structures essential for data wrangling (Series and DataFrames), and provides an in-depth exploration of indexing techniques for efficient data handling.

4_exploratory_data_analysis.ipynb

This notebook provides a comprehensive introduction to exploratory data analysis using Pandas. We start by exploring general dataset attributes, such as the number of rows and columns, and understanding column data types. The notebook then delves into methods for invoking descriptive statistics operations, such as calculating the mean and median, and describes the concept of axis in Pandas operations. The notebook also describes how missing values are handled and provides insights into sorting data and concludes with practical examples of using basic Pandas plots for data visualization.

5_arithmetic_ops_and_data_alignment.ipynb

This notebook provides a thorough overview of vectorization in Pandas and demonstrates the efficiency of vectorized operations over traditional loops, the concept of broadcasting in array manipulation, and how to apply arithmetic and comparison operations effectively in Pandas. Additionally, the notebook covers data querying and subsetting, highlighting the ease and speed of handling large datasets with these techniques.

6_0_summary_statistics.ipynb

This notebook offers a concise overview of summary statistics, essential for data analysis. It covers key concepts like central tendency measures (mean, median, mode). The notebook also discusses measures of variability (range, variance, standard deviation) and quantiles (quartiles, percentiles) and highlights their role in describing data distribution.

12_intro_probability.ipynb

This Jupyter Notebook serves as an introduction to basic probability concepts and terminology. I also introduces a simulation technique to illustrate the the long-term frequency of events by exploring a simple problem.

13_probability_distributions_binomial.ipynb

This Jupyter Notebook introduces the binomial probability distribution, providing a comprehensive exploration through practical examples.

14_probability_distributions_gaussian.ipynb

This Jupyter Notebook introduces the Guassian probability distribution, providing a comprehensive exploration through practical examples.

15_kernel_density_estimation.ipynb

This Jupyter Notebook introduces kernel density estimation, starting with an overview of histograms, their limitations, and moves on to the concept and application of kernel density estimation as a more effective method for estimating the probability density function of a random variable.

16_KDE_bandwidth.ipynb

This Jupyter Notebook focuses on the estimation of bandwidth in kernel density estimation, detailing the methodologies and considerations involved in selecting an optimal bandwidth to accurately approximate the probability density function of a dataset.

17_probability_distributions_poisson.ipynb

This Jupyter Notebook introduces the Poisson probability distribution, providing a comprehensive exploration through practical examples.

18_param_estimation_bootstrap.ipynb

This Jupyter Notebook covers parameter estimation with a focus on Bootstrap Confidence Intervals, explaining the process and techniques for estimating confidence intervals using the bootstrap method.

19_param_esitmation_maximum_likelihood.ipynb

This Jupyter Notebook presents parameter estimation through maximum likelihood (ML). It privides a practical understanding of Likelihood, and delves into the concept and significance of Log Likelihood in optimizing parameter estimates.

9_group_by.ipynb

This Jupyter Notebook explores the groupby method, focusing on the split-apply-combine strategy for data aggregation, transformation, filtering, and thinning within groups. It offers a concise examination of how to efficiently manage and analyze grouped data in Python.

10_hierarchical_indexes.ipynb

This Jupyter Notebook introduces Hierarchical Indexing, expanding upon its mention in our groupby discussions. It details how to implement multiple indexes on rows and/or columns. The concept of levels within a MultiIndex object is also explored, providing a foundational understanding of structured data manipulation and analysis.

About

Name: Sidney Gills

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%