### Pandas Data Exploration and Manipulation

First, we import the pandas library, which is essential for data manipulation and analysis in Python.

In [1]:
import pandas as pd

Bad pipe message: %s [b' 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Sa']
Bad pipe message: %s [b'ri/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/', b'ng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nA']
Bad pipe message: %s [b'ept-Language: en-US,en;q=0.9,hi;q=0.8,mr;q=0.7\r\nPriority: u=0, i\r\nReferer: https://studio.firebase', b'oogle.com/\r\nSec-Ch-Ua: "Google Chrome";v="141', b' "Not?A_Brand";v="8", "Chromium";']
Bad pipe message: %s [b'"141"\r\nSec-Ch-Ua-Arch: "x86"\r\nSec-Ch-Ua-Bitness: "64"\r\nSec-Ch-Ua-Form-Factors: "Desktop"\r\nSec-Ch-Ua-Full-Version: "14', b'0.7390.66"\r\nSec-Ch-Ua-Full-Version-List: "Google']
Bad pipe message: %s [b'hrome";v="141.0.7390.66", "Not?', b'Brand";v="8.0.0.0", "Chromium";v="141.0.7390.66"\r\nSec-Ch-Ua-Mobi', b': ?0\r\nSec-Ch-Ua-Model: ""\r\nSec-Ch-Ua-Platform: "Windows"\r\nSec-Ch-Ua-Platform-Version: "19.0.0"\r\n

### Data Loading and Initial Inspection
We load a dataset from a CSV file into a pandas DataFrame. This is the starting point for our data analysis.

In [2]:
df = pd.read_csv('new_csv.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'new_csv.csv'

The `.shape` attribute returns a tuple representing the dimensions (rows, columns) of the DataFrame.

In [None]:
df.shape

`.head()` is used to display the first few rows of the DataFrame. It's a quick way to get a feel for the data.

In [None]:
df.head()

`.describe()` provides a statistical summary of the numerical columns in the DataFrame, including measures like mean, standard deviation, and quartiles.

In [None]:
df.describe()

`.info()` offers a concise summary of the DataFrame, including data types and the number of non-null values in each column.

In [None]:
df.info()

### Data Cleaning and Transformation
The 'Unnamed: 0' column is often an artifact of saving a DataFrame to CSV without specifying `index=False`. We can remove it as it is redundant.

In [None]:
df.drop(columns=['Unnamed: 0'], inplace=True)

The `map` function is used here to convert the categorical 'age' column into a numerical one. This is a common preprocessing step for many machine learning models.

In [None]:
new_age = {'Adult':25,'Senior':60,'Teenage':15}
df['new_age'] = df['age'].map(new_age)

### Indexing and Selection
Now, let's explore different ways to select data from the DataFrame.

`iloc` is used for integer-location based indexing. Here, we select the first 11 rows and the columns at index positions 3, 8, and 11.

In [None]:
df.iloc[0:11,[3,8,11]]

`loc` is used for label-based indexing. This command selects the first 12 rows and the specified columns by their names.

In [None]:
df.loc[0:11,['city','dish_name','price']]

`.at` provides fast access to a single value by label.

In [None]:
df.at[2,'city']

`.iat` is the integer-based equivalent of `.at`, providing quick access to a single value by its integer position.

In [None]:
df.iat[2,3]

### Filtering and Querying
This section demonstrates how to filter data based on specific conditions.

Boolean indexing is a powerful way to filter data. This command selects all rows where the `new_age` is greater than 18 and the `rating` is greater than 3.

In [None]:
df[(df['new_age'] > 18)  & (df['rating']> 3)]

The `query` method provides a more readable way to perform the same filtering operation using a string-based query.

In [None]:
df.query('new_age > 20 and rating >3')

We can chain conditions to create more complex filters. Here, we select customers who have more than 100 loyalty points and are 25 years old.

In [None]:
df[(df['loyalty_points']>100) & (df['new_age']>20)]

This is another example of filtering using `query` for the same conditions.

In [None]:
df.query('loyalty_points > 100 and new_age == 25')

To count the number of customers who paid with cash, we can create a boolean Series and then use `.sum()`. True is treated as 1 and False as 0.

In [None]:
(df['payment_method'] == 'Cash').sum()