<a href="https://colab.research.google.com/github/halbtax/myFirstRepo/blob/master/CAS_1_pandas_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Basic Operations

In this notebook, we'll be working with a simple dataset on fictional book sales to practice some basic pandas operations. Let's get started!

## Step 1: Setup

*You may ignore the code in this cell. You must still execute it.*

The cell below creates the dataset.

In [4]:
# Sample data for demonstration.

data = """Title,Author,Genre,Price,Sold
The Great Tale,John Doe,Fiction,15,1500
Mystery of the Night,Jane Smith,Mystery,20,890
Learning Pandas,Alice Johnson,Education,25,340
Journey to the Stars,Bob Brown,Sci-fi,18,2200
History Repeats,Peter G.,History,22,500
"""

with open("book_sales.csv", "w") as file:
    file.write(data)

## Step 2: Reading the Dataset

We will start by loading our dataset. This dataset contains the sales data of fictional books.


In [13]:
import pandas

df = pandas.read_csv("book_sales.csv")

df.head()

Unnamed: 0,Title,Author,Genre,Price,Sold
0,The Great Tale,John Doe,Fiction,15,1500
1,Mystery of the Night,Jane Smith,Mystery,20,890
2,Learning Pandas,Alice Johnson,Education,25,340
3,Journey to the Stars,Bob Brown,Sci-fi,18,2200
4,History Repeats,Peter G.,History,22,500


Alternatively, you can also read `.xlsx` files using the command `pd.read_excel('excel_file.xlsx', sheet_name='sheet_name')`

Writing `pandas.read_csv` can be impractical. An alternative is to create an abbreviation for the pandas library. Writing `pandas.read_csv` can be impractical. An alternative is to create an abbreviation for the pandas library. You can do this with `import pandas as pd`. Try now to load the dataset using this abbreviation.

In [None]:
import pandas as pd

df = pd.read_csv("book_sales.csv")

df.head()

## Step 3: Describing the Data

Let's get a quick description of our dataset using the `describe()` function.


In [14]:
df.describe()

Unnamed: 0,Price,Sold
count,5.0,5.0
mean,20.0,1086.0
std,3.807887,766.602896
min,15.0,340.0
25%,18.0,500.0
50%,20.0,890.0
75%,22.0,1500.0
max,25.0,2200.0


## Step 4: Exploring Columns and Rows

We can easily see the columns in our dataframe and extract or remove them.


In [None]:
df.columns

In [None]:
prices = df["Price"]
prices

You can also select multiple columns from the dataframe. For example, the following command selects the columns `Title` and `Author`.

In [None]:
df[["Title", "Author"]]

In [None]:
df_without_sold = df.drop(columns=["Sold"])
df_without_sold

We can also retrieve rows from a dataset. The following command selects the first 3 rows of the dataset.

In [None]:
df[0:3]

Note that rows are numbered as 0, 1, 2, 3, ... In general, the command `df[a:b]`, for `a`, `b` natural numbers, selects columns `a`, `a+1`, `a+2`, `...`, `b-1` of the dataset. How can you select columns 2 and 3 from your dataset?

In [None]:
df[2:4]

## Step 5: Sorting Rows in a DataFrame

We can sort the rows in our dataframe based on any feature.

In [None]:
df_sorted_by_price = df.sort_values(by="Price")
df_sorted_by_price

## Step 6: Compute revenue

In [16]:
df["Revenue"] = df["Price"] * df["Sold"]
df

Unnamed: 0,Title,Author,Genre,Price,Sold,Revenue
0,The Great Tale,John Doe,Fiction,15,1500,22500
1,Mystery of the Night,Jane Smith,Mystery,20,890,17800
2,Learning Pandas,Alice Johnson,Education,25,340,8500
3,Journey to the Stars,Bob Brown,Sci-fi,18,2200,39600
4,History Repeats,Peter G.,History,22,500,11000


## Step 7: Top 3 bestsellers

Now you can apply what you have learned in this notebook. Compute the top 3 bestsellers.

In [None]:
df_sorted_by_revenue = df.sort_values(by="Revenue", ascending=False)
df_sorted_by_revenue[0:3][["Title", "Author"]]