With a calculated sample size of 355 for the high sales data, we can proceed with random sampling from the high_sales_states dataset. This will help ensure that our analysis is representative of the population while maintaining the independence of observations.

Here's how we can implement random sampling to obtain a sample of 355 records:

In [2]:
import pandas as pd

# Load the dataset
file_path = 'C:\\Users\\loydt\\Downloads\\Projects\\Superstore Sales Dataset.xlsx'
data = pd.read_excel(file_path)

In [3]:
# Filter the big four states generating higher sales
states_of_interest = ['Washington', 'California', 'New York', 'Florida', 'Pennsylvania']
state_data = data[data['State'].isin(states_of_interest)]

# Extract relevant columns
high_sales_states = state_data[['Segment', 'State', 'City', 'Region', 'Ship Mode', 'Order Date', 'Category', 'Sub-Category', 'Product Name', 'Sales']]

# Perform random sampling
sample_size = 355
random_sample = high_sales_states.sample(n=sample_size, random_state=42)  # random_state for reproducibility

print(random_sample)

        Segment         State           City Region       Ship Mode  \
3529   Consumer       Florida        Orlando  South  Standard Class   
7998  Corporate    Washington        Seattle   West     First Class   
6359  Corporate    Washington        Seattle   West  Standard Class   
7621   Consumer    California    Los Angeles   West  Standard Class   
973    Consumer  Pennsylvania   Philadelphia   East    Second Class   
...         ...           ...            ...    ...             ...   
452    Consumer      New York         Auburn   East    Second Class   
6484  Corporate      New York  New York City   East  Standard Class   
6582  Corporate    Washington        Edmonds   West  Standard Class   
5116   Consumer      New York  New York City   East  Standard Class   
5867  Corporate  Pennsylvania   Philadelphia   East  Standard Class   

     Order Date         Category Sub-Category  \
3529 2016-09-17       Technology  Accessories   
7998 2015-05-04        Furniture  Furnishings   


**Explanation:**
 - We use the sample() function from Pandas to randomly select 355 rows from the high_sales_states DataFrame.
 - Setting random_state ensures that the sampling is reproducible; the same random sample can be obtained every time the code is run with this specific seed.
 - After executing this code, random_sample will contain the randomly selected observations for further analysis.