This week, we have learned more about python which includes:
NumPy is a Python library that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. Arrays are efficient and fast for numerical operations. They enable you to work with large datasets more easily than standard Python lists.
In NumPy, the term "dimension" refers to the number of axes or directions in which data can vary. In NumPy, the ‘ndmin’ parameter is used when creating an array to specify the minimum number of dimensions that the resulting array should have. It allows you to explicitly set the number of dimensions.
From the example above, we call 7 dimensions with the expression 'ndmin=7' where the number 7 can be filled as desired.
Pandas is a Python library for efficient data manipulation and analysis. It simplifies data cleaning, transformation, and analysis tasks with its DataFrame and Series data structures. Pandas are widely used in data science to handle structured data, perform statistical operations, and work with various file formats.
A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any type. It is similar to a column in a table and can be thought of as a fixed-size dictionary, where the index labels map to the corresponding values.
Slicing allows to select specific rows or columns from the data structure based on their labels or positions.
Explicit data slicing retrieves a subset of data with reference to an explicitly specified index, such as an index range or a specific index. An implicit data slicing retrieves a subset of data with reference to an implicitly specified index, such as a specific rule or condition where the last index is not included in the resulting subset of data.
Loc calls an explicit index and an Iloc calls an implicit index. Loc and iloc used to remove inconsistencies in data slicing.
DataFrame is a collection of series with at least 1 series. A DataFrame in Python is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a table with rows and columns. In example, DataFrame builded by 3 series.
To import a CSV file in Python, you can use the Pandas library, which provides a simple and efficient way to work with structured data. Make sure csv data that has been aplouded in the same folder.
Given example of importing data 'Titanic.csv' with the “pd.read_csv()” function.
• viewing from top data • can be customized • head by default is top 5
• tail()returns a specified number of last rows. • tail()returns the last 5 rows if a number is not specified.
• info() method prints information about the DataFrame. • The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).
shape is the number of rows and columns of the DataFrame.
891 is the number of rows, 12 is the number of columns
columns returns the label of each column in the DataFrame.
• The index returns the index information of the DataFrame. • The index information contains the labels of the rows. If the rows has NOT named indexes, the index property returns a RangeIndex object with the start, stop, and step values.
• Returns the sum of the values in the specified axis • The sum() method adds all values in each column and returns the sum for each column. • By specifying the column axis (axis='columns'), the sum() method searches column-wise and returns the sum of each row.
Isnull used to finds NULL values.
Notnull used to finds values that are NOT NULL.
Returns a description summary for each column in the DataFrame
• Return the mean of the values in the specified axis • Mean: The average value
• Median: The mid point value • Return the median of the values in the specified axis
• Mode: The most common value • Returns the mode of the values in the specified axis
Returns the min of the values in the specified axis
Return the max of the values in the specified axis