### Automate CSV writing
This automation script will let you easily read and write CSV files. This script uses the Pandas module and below you can find the helpful methods that will be useful to automate your CSV.

In [None]:
# Automate CSV 
# pip install pandas
import pandas
# Read CSV File
data = pandas.read_csv("test.csv")
# Read CSV Specific Column
data = pandas.read_csv("test.csv", usecols=["col1", "col2"])
# Read CSV Specific Rows
data = pandas.read_csv("test.csv", nrows=5)
# Read CSV Specific Rows and Columns
data = pandas.read_csv("test.csv", usecols=["col1", "col2"], nrows=5)
# Read CSV File and Convert to JSON
data = pandas.read_csv("test.csv").to_json(orient="records")
# Write CSV File
data = {"col1": ["a", "b", "c"], "col2": [1, 2, 3]}
data = pandas.DataFrame(data)
data.to_csv("test.csv", index=False)
# Append Column to CSV File
data = pandas.read_csv("test.csv")
data["col3"] = ["x", "y", "z"]
data.to_csv("test.csv", index=False)
# Append Row to CSV File
data = pandas.read_csv("test.csv")
data = data.append({"col1": "d", "col2": 4}, ignore_index=True)
data.to_csv("test.csv", index=False)
# Drop Column from CSV File
data = pandas.read_csv("test.csv")
data = data.drop(columns=["col3"])
data.to_csv("test.csv", index=False)
# Drop Row from CSV File
data = pandas.read_csv("test.csv")
data = data.drop([2])
data.to_csv("test.csv", index=False)

### Increase Photo Definition

Now you can upscale your Photos resolution up to 2k, 3k, and 4k easily with this super automation script that uses the Super-Image module. This cool module uses a machine learning algorithm that will upscale the resolution of your photos by filling the missing pixels using different AI models.

In [None]:
# Upscale Your Photos
# pip install super-image
# pip install pillow
from super_image import *
from PIL import Image
def UpscaleImage(img_file, scale_value):
    img = Image.open(img_file)
    model = EdsrModel.from_pretrained('eugenesiow/edsr-base', scale=scale_value)
    photo = ImageLoader.load_image(img)
    upscale = model(photo)
    ImageLoader.save_image(upscale,  './upscale.png')
UpscaleImage('test.jpg', 4)

### PDF to Audio
Create your own Audiobook or convert any pdf to audio format with this killer automation script that uses Text-to-speech and PyPDF2 module. This is handy when you want to convert your whole PDF book or Text to audio format.

In [None]:
# PDF to Audio
# pip install text-to-speech
# pip install PyPDF2
from text_to_speech import speak
from PyPDF2 import PdfFileReader
def PDFtoAudio(pdf_path):
    text = []
    with open(pdf_path, 'rb') as f:
        pdf = PdfFileReader(f)
        for page in pdf.pages:
            text.append(page.extractText())
    speak(' '.join(text), 'en', save=True, file='audio_book.mp3')
PDFtoAudio('test.pdf')

### Generate Text
Have you ever wondered if you just write some Topic and Python generates the whole text according to that topic? Well with this awesome automation script you can do that. This script uses the Transformer module that uses the GTP2 module in its background to generate the text by a given topic

In [None]:
# Generate Text
# pip install transformers
from transformers import pipeline
def Generate_Text(txt):
    gen = pipeline("text-generation", model="gpt2")
    output = gen(txt, max_length=100, num_return_sequences=1)
    return output[0]['generated_text']
print(Generate_Text("Science of the future"))

### Send Marketing Emails
Sending marketing Emails to your audience is a common thing for any business. But you can now automate this process using this Python automation script. This fabulous script uses the Mailjet module that allows you to send 200 Emails per day for free. You can get their API easily and automate your Emails.

In [None]:
# Send Marketing Emails
# pip install mailjet-rest
from mailjet_rest import Client
SMTP = Client(auth=("api_key", "api_secret"))
email_data = {
    'Messages': [
        {
            "From": {
                "Email": "from_email",
                "Name": "from_name"
            },
            "To": [
                {
                    "Email": "to_email",
                    "Name": "to_name"
                }
            ],
            "Subject": "Test Email",
            "TextPart": "Hi there, this is a test email",
        }
    ]
}
mail = SMTP.send.create(data=email_data)
print(mail.status_code)
print("Email sent")

### Airtable Extractor
Want to extract Airtable data then use this awesome automation script that uses the Airscraper module that will simply take the shareable URL of Airtable then extract the data and store it in the CSV format.

In [None]:
# Airtable Scraper
# pip install airscraper
from airscraper import AirScraper
def Airtable(urls):
    scraper = AirScraper(["Urls"])
    data = scraper.get_table().text
    print("Data: ", data)
    with open('data.csv','w') as f:
        f.write(data)
Airtable(["https://airtable.com/123"])

### Read PDF contents with OCR
Python is widely used for analyzing the data but the data need not be in the required format always. In such cases, we convert that format (like PDF or JPG, etc.) to the text format, in order to analyze the data in a better way. Python offers many libraries to do this task. There are several ways of doing this, including using libraries like PyPDF2 in Python. The major disadvantage of using these libraries is the encoding scheme. PDF documents can come in a variety of encodings including UTF-8, ASCII, Unicode, etc. So, converting the PDF to text might result in the loss of data due to the encoding scheme. Let’s see how to read all the contents of a PDF file and store it in a text document using OCR. Firstly, we need to convert the pages of the PDF to images and then, use OCR (Optical Character Recognition) to read the content from the image and store it in a text file. 

Part #1 deals with converting the PDF into image files. Each page of the PDF is stored as an image file. The names of the images stored are: PDF page 1 -> page_1.jpg PDF page 2 -> page_2.jpg PDF page 3 -> page_3.jpg …. PDF page n -> page_n.jpg.

Part #2 deals with recognizing text from the image files and storing it into a text file. Here, we process the images and convert it into text. Once we have the text as a string variable, we can do any processing on the text. For example, in many PDFs, when a line is completed, but a particular word cannot be written entirely in the same line, a hyphen (‘-‘) is added, and the word is continued on the next line. For example –
```
This is some sample text but this parti-
cular word could not be written in the same line.
```
Now for such words, a fundamental pre-processing is done to convert the hyphen and the new line into a full word. After all the pre-processing is done, this text is stored in a separate text file. To get the input PDF files used in the code, click d.pdf. 

Below is the implementation: 

In [None]:
# Requires Python 3.6 or higher due to f-strings

# install libs needed
# pip3 install PIL
# pip3 install pytesseract
# pip3 install pdf2image
# sudo apt-get install tesseract-ocr

# Import libraries
import platform
from tempfile import TemporaryDirectory
from pathlib import Path

import pytesseract
from pdf2image import convert_from_path
from PIL import Image

if platform.system() == "Windows":
	# We may need to do some additional downloading and setup...
	# Windows needs a PyTesseract Download
	# https://github.com/UB-Mannheim/tesseract/wiki/Downloading-Tesseract-OCR-Engine

	pytesseract.pytesseract.tesseract_cmd = (
		r"C:\Program Files\Tesseract-OCR\tesseract.exe"
	)

	# Windows also needs poppler_exe
	path_to_poppler_exe = Path(r"C:\.....")
	
	# Put our output files in a sane place...
	out_directory = Path(r"~\Desktop").expanduser()
else:
	out_directory = Path("~").expanduser()	

# Path of the Input pdf
PDF_file = Path(r"d.pdf")

# Store all the pages of the PDF in a variable
image_file_list = []

text_file = out_directory / Path("out_text.txt")

def main():
	''' Main execution point of the program'''
	with TemporaryDirectory() as tempdir:
		# Create a temporary directory to hold our temporary images.

		"""
		Part #1 : Converting PDF to images
		"""

		if platform.system() == "Windows":
			pdf_pages = convert_from_path(
				PDF_file, 500, poppler_path=path_to_poppler_exe
			)
		else:
			pdf_pages = convert_from_path(PDF_file, 500)
		# Read in the PDF file at 500 DPI

		# Iterate through all the pages stored above
		for page_enumeration, page in enumerate(pdf_pages, start=1):
			# enumerate() "counts" the pages for us.

			# Create a file name to store the image
			filename = f"{tempdir}\page_{page_enumeration:03}.jpg"

			# Declaring filename for each page of PDF as JPG
			# For each page, filename will be:
			# PDF page 1 -> page_001.jpg
			# PDF page 2 -> page_002.jpg
			# PDF page 3 -> page_003.jpg
			# ....
			# PDF page n -> page_00n.jpg

			# Save the image of the page in system
			page.save(filename, "JPEG")
			image_file_list.append(filename)

		"""
		Part #2 - Recognizing text from the images using OCR
		"""

		with open(text_file, "a") as output_file:
			# Open the file in append mode so that
			# All contents of all images are added to the same file

			# Iterate from 1 to total number of pages
			for image_file in image_file_list:

				# Set filename to recognize text from
				# Again, these files will be:
				# page_1.jpg
				# page_2.jpg
				# ....
				# page_n.jpg

				# Recognize the text as string in image using pytesserct
				text = str(((pytesseract.image_to_string(Image.open(image_file)))))

				# The recognized text is stored in variable text
				# Any string processing may be applied on text
				# Here, basic formatting has been done:
				# In many PDFs, at line ending, if a word can't
				# be written fully, a 'hyphen' is added.
				# The rest of the word is written in the next line
				# Eg: This is a sample text this word here GeeksF-
				# orGeeks is half on first line, remaining on next.
				# To remove this, we replace every '-\n' to ''.
				text = text.replace("-\n", "")

				# Finally, write the processed text to the file.
				output_file.write(text)

			# At the end of the with .. output_file block
			# the file is closed after writing all the text.
		# At the end of the with .. tempdir block, the
		# TemporaryDirectory() we're using gets removed!	
	# End of main function!
	
if __name__ == "__main__":
	# We only want to run this if it's directly executed!
	main()


In the above, the pages of the PDF were converted to images. Then the images were read, and the content was written into a text file. 

Advantages of this method include:

- Avoiding text-based conversion because of the encoding scheme resulting in loss of data.
- Even handwritten content in PDF can be recognized due to the usage of OCR.
- Recognizing only particular pages of the PDF is also possible.
- Getting the text as a variable so that any amount of required pre-processing can be done.

Disadvantages of this method include:
- Disk storage is used to store the images in the local system. Although these images are tiny in size.
- Using OCR cannot guarantee 100% accuracy. However, a computer-typed PDF document results in very high accuracy.
- Handwritten PDFs are still recognized, but the accuracy depends on various factors like handwriting, page color, etc.

how to extract price data out of a Zipline Bundle

In [None]:
import pandas as pd

from zipline.data.bundles.core import load
from zipline.data.data_portal import DataPortal
from zipline.utils.calendar_utils import get_calendar


# Load extensions if you have not already
load_extensions(True, [], False, os.environ)

# Use your bundle's name
bundle_data = load("quotemedia", os.environ, None)


as_of_date = pd.Timestamp("2024-01-01")
symbols = ["MSFT", "AMZN", "NVDA"]

# Get the method that looks up the asset from the Zipline Bundle
asset_finder = bundle_data.asset_finder

# Get the list of Equity objects by string
assets = asset_finder.lookup_symbols(symbols, as_of_date=as_of_date)

# Create a DataPortal which handles all the drama of finding the bundle
# and stitching together the bocolz files
data_portal = DataPortal(
    asset_finder=asset_finder,
    equity_daily_reader=bundle_data.equity_daily_bar_reader,
    trading_calendar=get_calendar("NYSE"),
    first_trading_day=pd.Timestamp("2000-01-03"),
)

# Get the price data. Field can be open, high, low, close, or price.
data = data_portal.get_history_window(
    assets=assets,
    end_dt=pd.Timestamp("2023-12-31"),
    bar_count=100,
    frequency="1d",
    field="close",
    data_frequency="daily",
    ffill=True
)