# Astrology Verification via Language Model

This Jupyter notebook introduces an innovative project aimed at testing the validity of astrology. The project involves the use of a language model to analyze the biographies of people born on specific dates, corresponding to each astrological sign, and summarizing their characteristics. These summaries are then used to see if a certain astrological sign could be assigned based on the identified characteristics.

The individuals are selected based on their birth date and their fame, ensuring a rich biography for the language model to process. The summaries generated by the language model serve as a character analysis based on the individuals' biographies. The language model is then tasked to assign an astrological sign to each person based on these summaries.

The underlying assumptions are that the character analysis based on the biography is accurate and that the correct astrological sign can be determined from these characteristics. This approach offers an intriguing way to examine the claims of astrology through the lens of data analysis and natural language processing.

The biographies will be processed through a local Large Language Model, or Ollama. This tool will assign astrology signs, assuming the model has absorbed enough modern astrological information to make similar conclusions about people as typically done in astrology.

First, we will create a data file by randomly selecting a certain number of renowned individuals born at the midpoint of an astrological sign. This ensures that the characteristics of the specific sign are at their strongest. We will retrieve the names of these individuals, along with the Wikipedia links to their biographies.

Next, we will automatically scrape all the biographies into a CSV or SQL file. This will result in a table containing birthdates, names, and biographies.

Afterward, the OLAMA model will extract characteristics from this data. We will ensure the few-shot prompt functions correctly and verify that it provides the required results.

In this section, we'll loop through the biographies and use them as context for deriving personal information. If there's a 'Personal Life' section available in the Wikipedia page, we'll just take this section. Otherwise, we'll use the whole biography. The derived personal information will be inputted into the OLAMA() function, resulting in a list of short characteristics for each individual.

We will use a method "few-shot-prompting" to generate the data that we want for our analysis. 

So, lets use it now to see if it can generate a astrological analysis, based on the knowledge of astrology that the language model might have. To test it, let's first read an astrological analysis of two to three people, their short biography and then the astrological sign in which they were born. Then we put their biographies as input, we let the model behave like an astrologer, and then see if we end up having the same conclusion. We ask the model to give a reason for why it chose what it chose. 

We have to erase the names of the persons. Write a method for that. It is possible to cut out the name of the person manually, but then the text is still so descriptive, that the language model will know who it is a bout, including the date of birth.

So the characteristics first go through a prompt like this: "Describe these traits as if they were from a random person
... , make no reference to Barack Obama"


But then still, much reference is made to the activities of the person and it could be easily known what he/she has done, that's why we throw it through another prompt. "Just summarize the characteristics of the person, without mentioning in any 
... way examples of his/her behavior"

# Using Mistral

In [60]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.llms import Ollama


#TODO: make it possible to choose between models


def local_mistral(question):
    llm = Ollama(model="mistral")
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a proficient and knowledgable person"),
        ("user", "{input}")
    ])
    output_parser = StrOutputParser()
    chain = prompt | llm | output_parser

    return chain.invoke({"input": f"{question}"})

def characteristics_of(person):
    prompt = f""" Describe the positive and negative character traits of {person}. Keep it very short
            """
    
    answer = local_mistral(prompt)
    
    return answer

def characteristics_of_2(person):
    '''prompt = f""" What are the motivations, values, and 
            behaviors of {person}. What are his healthy and unhealthy sides?
            """ '''
   
    prompt = f'describe {person} in a few words'

    answer = local_mistral(prompt)

    return answer



def unpersonal_characteristics(characteristics, person):

    prompt = f"""

    positive and negative traits: "{characteristics}"

    Based on these positive and negative traits, make
    a general overview of characteristics while making no reference to {person}. Just summarize the characteristics 
    of the person. So don't mention in any way examples of his/her 
    behavior. Keep it short. """
    
    answer = local_mistral(prompt)


    return answer

def assign_zodiac_to(traits):
    prompt = f""" traits: "{traits}""
            question: "What could be the astrology sign of this person based on these traits?"
            answer: [just answer with one word, for example: "Pisces", "Virgo", not two!]

            """
    answer = local_mistral(prompt)
    print(answer)

    return answer

def predicted_astro_sign(person):
    
    # generate characteristics of the person
    characteristics = characteristics_of_2(person)
    
    # transform the characteristics to unpersonal traits
    unpersonal_traits = unpersonal_characteristics(characteristics, person)
    
    # draw a zodiac sign based on the traits
    astro_sign = assign_zodiac_to(unpersonal_traits)
    
    #TODO: write something to redo the last method in case 
    # it does not output a single word
    
    return (unpersonal_traits, astro_sign)


'''person = 'Donald Trump'
predicted_astro_sign(person)'''
    

"person = 'Donald Trump'\npredicted_astro_sign(person)"

# Creating a vector store

# Scraping Astro websites



In [59]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

astrological_signs = [
    "Aries",
    "Taurus",
    "Gemini",
    "Cancer",
    "Leo",
    "Virgo",
    "Libra",
    "Scorpio",
    "Sagittarius",
    "Capricorn",
    "Aquarius",
    "Pisces"
]


# List to store data
data = []


max_matches_per_sign = 50  # Maximum number of matches per astrological sign

for sign in astrological_signs:
    match_count = 0  # Counter for matches of the current astrological sign
    source = requests.get(f"https://astro-charts.com/persons/top/{sign.lower()}/").text
    soup = BeautifulSoup(source, 'html.parser')
    for match in soup.find_all('div', class_="celeb-info"):
        if match_count >= max_matches_per_sign:
            break
        name = match.find_all('p')[1].text
        data.append({"Name": name, "Sign": sign})
        match_count += 1

# Create DataFrame
df = pd.DataFrame(data)
print(len(df))

# Display DataFrame
print(df.head())

# Save DataFrame to a CSV file
# df.to_csv('astrological_signs.csv', index=False)

600
            Name   Sign
0  Matthew Healy  Aries
1      Lady Gaga  Aries
2  Conan O'Brien  Aries
3          Quavo  Aries
4   Mariah Carey  Aries


In [53]:
import time

# Initialize the new column with default values (e.g., None)
df['Unpersonalized description'] = None
df['Predicted'] = None


length = 1

def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours}h {minutes}m {seconds}s"

# Loop through the DataFrame using basic indexing
start_time = time.time()
time_stamps = []
for i in range(length):
    start_generating = time.time()
    name = df.iloc[i, 0]  # Get the name from the first column

    # track which number of generations we are
    print(f"Progress: {i}/{length}")

    # predict the astrology sign
    unpersonal_description, predicted_result = predicted_astro_sign(name)

    # save elapsed time 
    elapsed_time = time.time() - start_time

    # save the time that the generating took
    time_after_generating = time.time() - start_generating

    # add generating time to a list of generating times
    time_stamps.append(time_after_generating)

    # calculate average generating time
    average_generating_time = sum(time_stamps) / len(time_stamps)

    # generations to go
    to_go = length - i

    # expected time
    expected_time = format_time(to_go * average_generating_time)

    # print expected time
    print(f"the average generating time is {average_generating_time}, we have {to_go} more to go, so the expected time is {expected_time}")

    df.iloc[i, 2] = unpersonal_description  # Set the unpersonal traits in the third column
    df.iloc[i, 3] = predicted_result  # Set the predicted result in the fourth column


# Display the updated DataFrame
print(df.head())

Progress: 0/1
 Leo
the average generating time is 11.067599296569824, we have 1 more to go, so the expected time is 0h 0m 11s
            Name   Sign                         Unpersonalized description  \
0  Matthew Healy  Aries   This individual is a charismatic creative, ex...   
1      Lady Gaga  Aries                                               None   
2  Conan O'Brien  Aries                                               None   
3          Quavo  Aries                                               None   
4   Mariah Carey  Aries                                               None   

  Predicted  
0       Leo  
1      None  
2      None  
3      None  
4      None  


# Plotly plots

In [10]:
from dash import Dash, html, dcc, callback, Output, Input
import plotly.express as px
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder_unfiltered.csv')

app = Dash()

app.layout = [
    html.H1(children='Title of Dash App', style={'textAlign':'center'}),
    dcc.Dropdown(df.country.unique(), 'Canada', id='dropdown-selection'),
    dcc.Graph(id='graph-content')
]

@callback(
    Output('graph-content', 'figure'),
    Input('dropdown-selection', 'value')
)
def update_graph(value):
    dff = df[df.country==value]
    return px.line(dff, x='year', y='pop')

if __name__ == '__main__':
    app.run(debug=True)


NoLayoutException: Layout must be a dash component or a function that returns a dash component.

In [26]:
import time
import random
import sys

class TextGenerationProgressTracker:
    def __init__(self, total_length):
        self.total_length = total_length
        self.current_progress = 0
        self.start_time = time.time()
        self.time_stamps = []

    def update_progress(self):
        self.current_progress += 1
        current_time = time.time()
        self.time_stamps.append(current_time)
        self._print_progress()

    def _print_progress(self):
        elapsed_time = self.time_stamps[-1] - self.start_time
        mean_generation_time = elapsed_time / self.current_progress
        remaining_generations = self.total_length - self.current_progress
        estimated_time_remaining = mean_generation_time * remaining_generations

        progress_message = (
            f"\rProgress: {self.current_progress}/{self.total_length} | "
            f"Elapsed Time: {elapsed_time:.2f} seconds | "
            f"Estimated Time Remaining: {estimated_time_remaining:.2f} seconds"
        )
        sys.stdout.write(progress_message)
        sys.stdout.flush()

# Example usage
total_length = 100  # Total number of text pieces to generate
tracker = TextGenerationProgressTracker(total_length)

for i in range(total_length):
    # Simulate text generation process
    time.sleep(random.uniform(0.01, 0.1))  # Simulating variable generation time

    # Loop through the DataFrame using basic indexing
    for i in range(5):
        name = df.iloc[i, 0]  # Get the name from the first column
        predicted_result = predicted_astro_sign(name)
        df.iloc[i, 2] = predicted_result  # Set the predicted result in the third column


    tracker.update_progress()

# Print a newline at the end to move the cursor to the next line
print()

KeyboardInterrupt: 