# NumPy Getting Started Guide

NumPy, which stands for Numerical Python, is a fundamental library for scientific computing in Python. It provides support for arrays and matrices, as well as mathematical functions for operations with these objects. It is also the basis of the Pandas library.

## Random numbers and basic statistics

In [3]:
import numpy as np

rng = np.random.default_rng()

print(rng)

Generator(PCG64)


In [5]:
random_number = rng.random() # create number between 0-1
random_number2 = rng.random()*10 # change the range of the number

print(random_number)
print(random_number2)

0.8586586408535858
1.904963446070077


In [7]:
random_array = rng.random(3)
print(random_array)

random_array2 = rng.random(3)*100
print(random_array2)

[0.86557138 0.87972105 0.91710165]
[12.92457409 98.42831672 74.80209901]


 Let's create a scenario where this random data can be useful for a sales analysis.

Suppose you are a sales analyst at a company and want to better understand the sales performance of a specific product. However, you don't have access to the actual sales data, so you decide to generate some random sales data to perform your analysis.

In [9]:
# Generate fake sales data for 30 days
# Let's assume sales of a product can vary from 50 to 200 per day

rng = np.random.default_rng(seed=42)
data_sales = rng.integers(low=50, high=200, size=30)
print(data_sales)

[ 63 166 148 115 114 178  62 154  80  64 128 196 160 164 157 167 126  69
 175 117 125 105  77 189 167 146 110 173 131 116]


Now, you can use this data to perform various analyses. For example, you may want to know which day had the highest sales, the lowest sales, or the average sales during the month. Here is how you can do this:


In [11]:
highest_sales_day = np.argmax(data_sales)+1 # because this is the day, not the index
lowest_sales_day = np.argmin(data_sales)+1
mean_sales = np.mean(data_sales)

print(f'The day with the highest sales was {highest_sales_day}, the one with minimum was {lowest_sales_day}, and mean was {mean_sales}.')

The day with the highest sales was 12, the one with minimum was 7, and mean was 131.4.


Brief summary and simplified concepts of the statistical functions mentioned:

1. Median:
The median is a value that divides a set of data into two equal parts. To find it, you must organize the data in ascending or descending order and choose the middle value. If there is an odd number of data, the median will be exactly the central value. If there is an even number of data, the median will be the average of the two middle values.

2. Percentile:
The percentile is a statistical measure that indicates the relative position of a piece of data within a set of data. It informs the percentage of values that are below this data. For example, the 50th percentile (also known as the median) divides the data into two equal parts, with 50% of the values below it and 50% above it.

3. Standard deviation:
Standard deviation is a measure that indicates how spread out the values in a data set are relative to the mean. It shows the variability of the data relative to the average value. A larger standard deviation indicates that the data is more spread out, while a smaller standard deviation indicates that the data is closer to the mean.

4. Variance:
Variance is another measure of dispersion that indicates how far the values in a data set are from the mean. It is calculated as the average of the squares of the differences between each value and the mean. Variance provides a measure of the total dispersion of the data, regardless of whether it is greater or less than the mean.

These measures are widely used in statistics to summarize and analyze data sets. They provide valuable information about the distribution, variability and position of data, allowing a more complete understanding of it.

In [15]:
print(np.median(data_sales)) # median
print(np.percentile(data_sales,50)) # percentile 50
print(np.std(data_sales)) # standard deviation
print(np.var(data_sales)) # variance

129.5
129.5
39.305300745149715
1544.9066666666665
