# Python: Starting with the basics

Understanding the basic structure of Python is important because it enables us to write correct, efficient, and maintainable code, and to leverage the power of Python's vast ecosystem of libraries and frameworks.

1.     Syntax: Every programming language has its own syntax, which is a set of rules for how to write code in that language. Understanding the syntax of Python is important because it enables you to write correct and error-free code. <br>

2. Program flow: Every program has a flow, which is the order in which statements are executed. Understanding the flow of a Python program is important because it enables you to write programs that behave as expected and to debug programs when things go wrong. <br>

3. Data types and structures: Python has a rich set of data types and structures, including strings, numbers, lists, dictionaries, and more. Understanding these data types and structures is important because it enables you to write programs that manipulate and process data in meaningful ways. <br>



In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np

In [2]:
# Reading the csv file
df = pd.read_csv('28.5k_categorized.csv')
df.head(2)

Unnamed: 0,article_link,title,author,category,article_content,article_tokens
0,https://www.axios.com/2023/03/13/silicon-valle...,Silicon Valley Bank's U.K. arm sold to HSBC fo...,Kia Kokalitcheva,Economy & Business,"Silicon Valley Bank's, U.K. business has been ...","['silicon', 'valley', 'bank', 'business', 'sel..."
1,https://www.axios.com/2023/03/13/biden-alaska-...,Biden to protect 16M acres in Alaska as oil pr...,"Rebecca Falconer,Andrew Freedman",Energy & Environment,The Biden administration is moving to protect ...,"['biden', 'administration', 'protect', 'millio..."


## Variables
- Variables are used to store values in Python. You can store many types of data in variables including:
    1. Numbers - Integers, floats, and complex numbers.
    2. Strings - Textual data enclosed in quotes.
    3. Boolean values - True or False.
    4. Lists - Ordered sequences of values, enclosed in square brackets.
    5. Tuples - Immutable ordered sequences of values, enclosed in parentheses.
    6. Sets - Unordered collections of unique values, enclosed in curly braces.
    7. Dictionaries - Unordered collections of key-value pairs, enclosed in curly braces with each key-value pair separated by a colon.

## Operators
- These are the main categories of operators in Python, and there are many more specific operators within each category. Understanding these operators is essential for writing effective Python code

### 1. Arithmetic Operators 
- used to perform arithmetic operations such as addition, subtraction, multiplication, division, and more.
        + (addition)
        - (subtraction)
        * (multiplication)
        / (division)
        % (modulus)
        ** (exponentiation)
        // (floor division)
        
### 2. Comparison Operators
- used to compare two values and return a Boolean value (True or False)
        == (equal to)
        != (not equal to)
        < (less than)
        > (greater than)
        <= (less than or equal to)
        >= (greater than or equal to)
        
        
### 3. Logical operators
- used to combine two or more Boolean expressions and return a Boolean value
        and (logical and)
        or (logical or)
        not (logical not)
        
### 4. Assignment operators
- used to assign values to variables. 
        = (simple assignment)
        += (addition assignment)
        -= (subtraction assignment)
        *= (multiplication assignment)
        /= (division assignment)
        %= (modulus assignment)
        **= (exponentiation assignment)
        //= (floor division assignment)

### 5. Bitwise operators
- used to perform bitwise operations on integer values
        & (bitwise and)
        | (bitwise or)
        ^ (bitwise xor)
        ~ (bitwise not)
        << (left shift)
        >> (right shift)
        
### 6. Identity operators
- used to compare the identity of two objects
        is (object identity)
        is not (negated object identity)
        
### 7. Membership operators
- used to test whether a value is a member of a sequence or collection. 
        in (membership test)
        not in (negated membership test)
        
## Common Flow statements 
- refers to the order in which statements are executed in a program. Control flow statements are used to alter the normal execution of a program based on certain conditions

        if statements: used to execute a block of code if a certain condition is true
        else statements: used in conjunction with if statements to execute a block of code if the condition is false
        elif statements: used to chain multiple if statements together, allowing for multiple conditions to be checked
        while loops: used to execute a block of code repeatedly as long as a certain condition is true
        for loops: used to iterate over a sequence of elements (such as a list or string) and execute a block of code for each element






##### Data Exploration 
Raw data is  reviewed with a combination of manual workflows and automated data-exploration techniques to visually explore data sets, look for similarities, patterns and outliers and to identify the relationships between different variables.

### We will use the variables and operators in exploring the data that we have

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28461 entries, 0 to 28460
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   article_link     28461 non-null  object
 1   title            28461 non-null  object
 2   author           28461 non-null  object
 3   category         28461 non-null  object
 4   article_content  28461 non-null  object
 5   article_tokens   28461 non-null  object
dtypes: object(6)
memory usage: 1.3+ MB


We could see that all of the variables (article_link, title, author, category, article_contenct, article_tokens) have an **object datatype** <br><br>
                    *the object data type is the base type for all other data types. It represents a general-purpose object that can contain any value or data type*

### Question 1: What if you want to know how many articles did Kia Kokalitcheva have wrote?

In [30]:
# We will use variables, logical operators, and for loops
author = 'Kia Kokalitcheva'

# Method 1
count=[]
for i in df['author']:
    if i == author:
        count.append(i)
print("The author has written", len(count), "articles.")

# Method 2, more advance knowledge but less lenghty code
count = df[df['author'] == "Kia Kokalitcheva"].shape[0]
print("The author has written", count, "articles.")

The author has written 198 articles.
The author has written 198 articles.


### Question 2: What if you want to know how many number of article has a category of Economy & Business or Energy & Environment

In [47]:
# Using for loops and bitwise operator

# Method 1
var_1 = 'Economy & Business'
var_2 = 'Energy & Environment'
count =[]
for i in df['category']:
    if (i == var_1) | (i == var_2):
        count.append(i)
print('The total number of articles are', len(count))

# Method 2
count = ((df['category'] == 'Economy & Business') | (df['category'] == 'Energy & Environment')).sum()
print('The total number of articles are', count)


The total number of articles are 6531
The total number of articles are 6531


### Using ternary operator in codes
#### value_if_true if condition else value_if_false


In [57]:
# Example 
x = 10
y = 20
z = 30
value = z if x ==y else (x if x < z and z==y else y)
print(value)

20


In [80]:
# Trying it with strings
author = 'Kia Kokalitcheva'
var_1 = 'Economy & Business'
var_2 = 'Energy & Environment'

result = 'good' if ((df['author'] == author) & (df['category'] == var_1)).any else 'bad'
print(result)

good


## Working with Lists
- I will use the column of the dataframe as the value of the list

There are many functions and methods that can be used with lists to manipulate their contents. Here are some of the most common ones:

    append(): adds an item to the end of the list
    insert(): adds an item at a specified index
    remove(): removes the first occurrence of a specified item
    pop(): removes and returns the item at a specified index (or the last item if no index is specified)
    index(): returns the index of the first occurrence of a specified item
    count(): returns the number of times a specified item appears in the list
    sort(): sorts the items in the list in ascending order
    reverse(): reverses the order of the items in the list
    len(): returns the number of items in the list

In [102]:
columns = df.columns.tolist()
print(columns)

['article_link', 'title', 'author', 'category', 'article_content', 'article_tokens']


In [103]:
# Extracting certain value using the index
print('Extracting indexes 3:5', columns[3:5])

Extracting indexes 3:5 ['category', 'article_content']


In [104]:
# Modifying the list by adding 'region'
columns.append('region')
print(columns)

['article_link', 'title', 'author', 'category', 'article_content', 'article_tokens', 'region']


In [105]:
# Adding 'gender' in a certain index
columns.insert(0, 'art')
print(columns)

['art', 'article_link', 'title', 'author', 'category', 'article_content', 'article_tokens', 'region']


In [111]:
# Using enumerate

for key, value in enumerate(columns):
    print(key, value)

0 title
1 region
2 category
3 author
4 article_tokens
5 article_link
6 article_content
7 art


### Converting list to a comma-seperated value

In [112]:
csv = ', '.join(columns)
print(csv)

title, region, category, author, article_tokens, article_link, article_content, art


In [113]:
# it is not just a single string
type(csv)

str

In [116]:
# converting the string to a list
columns = csv.split(', ')
print(columns)

['title', 'region', 'category', 'author', 'article_tokens', 'article_link', 'article_content', 'art']


## Working with Tuples
- A tuple is an ordered collection of elements, just like a list, but once a tuple is created, it cannot be modified. 
- Tuples are used to store an immutable sequence of elements. They are particularly useful when used to group related data together or to return multiple values from a function.

In [131]:
columns = ('title', 'region', 'category', 'author', 'article_tokens', 'article_link', 'article_content', 'art', 'art')
type(columns)

tuple

In [132]:
print(len(columns))

9


In [133]:
print(columns[0])

title


In [134]:
print(columns.count('art'))

2


## Working with Sets
- Sets are used to store a collection of **unique elements.** They are particularly useful when use to test membership, perform set operations like **union, intersection, and difference**, or remove duplicates from a list. 

In [135]:
columns = {'title', 'region', 'category', 'author', 'article_tokens', 'article_link', 'article_content', 'art', 'art'}

In [136]:
type(columns)

set

In [137]:
# We could see here that 'art' is not shown twice because the set are used for unique elements only
print(columns)

{'title', 'article_tokens', 'region', 'art', 'article_link', 'category', 'author', 'article_content'}
