This Jupyter notebook aims to find a possible argument for the validity of astrology. This is done by retrieving the characteristics of a few hundred celebrities and assigning the right zodiac sign to them based on their characteristics. If the accuracy score is already higher than 1 divided by 12, it is not invalid to say that astrology might not be complete. Initially, I sought to generate the characteristics of the celebrities based on their Wikipedia pages and looping through them by retrieving the links of their Wikipedia pages via Sparkle. I only found out that a more concise summary of the life of a celebrity can be given by just inputting the right prompt in a language model. Since the language models that are now freely available on the market have been trained on billions of internet pages that are often covering celebrities, the information that comes out of the model is very accurate and with right prompting it can also be tweaked to represent more the characteristics of the celebrity rather than the activities and achievements that he or she has made.

The first step of building this code is by making some effective prompts. To receive the output from the language model that we want. The language model that is used is Mistral-7B. It is a relatively small model of around 4 GB that is allegedly better than the Facebook-made Llama model of even a bigger amount of parameters. Since it is quite expensive to run a language model for hours, not only because you pay for the usage of the language model, but also for the GPU that is needed for it. This model was attractive since it could just run on my M2 chip of my MacBook and it still was generating the information that was needed to make this analysis.

There are a few things to consider. When we ask a language model explicitly about the characteristics of person X, unavoidably the name of X will be used in the answer as well as activities and achievements that are exemplary and can be used to easily trace down the output to the inputted person. This will be bad for the purpose of this code since it might bias the outcome by already giving the right astrological sign in the output since there are many internet articles making astrological analysis about celebrities and we would miss the blindfoldedness of the code that we wish to have in order to see if we can really assign the right astrological sign to the celebrity with which we actually would predict the time frame in which a person is born by the sheer characteristics of him. 

This langchain library might look quite easy to use, but to get the output from a prompt that you want, it takes quite some tweaking and adjusting, and that is a time-consuming process. It is a quite thankful thing to work on, since the accuracy score doesn't need to be so big to still righteously make the claim that Astrology might play a role in... ...characteristics of a person, because as soon as the accuracy score is just a few points higher than 8.3, the element of randomness falls away and other forces might be at play.

Let's start with writing the first method. That is by creating a method that takes a prompt as input and gives a generated text as output. We import a few classes from the  Langchain Library for this and make connection with a local Mistral language model. One thing about the Langchain library is that it uses some specific method of combining multiple elements that are needed to create the output. And that is a method called chaining, where in an intuitive way the different elements are just put together on one line with a vertical dash between them. And it gets combined in a code that functions as a chain. 

In [5]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.llms import Ollama

def local_mistral(question):
    """
    Generates a response to a given question using the Mistral model via LangChain.

    Args:
        question (str): The question to be answered by the model.

    Returns:
        str: The response generated by the model.
    """
    # Initialize the LLM model "mistral"
    llm = Ollama(model="mistral")
    
    # Create a prompt template with predefined system and user messages
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a proficient and knowledgeable person"),  # System message for setting context
        ("user", "{input}") 
    ])
    
    # Initialize the output parser to handle the model's response
    output_parser = StrOutputParser()
    
    # Chain the prompt, model, and output parser together
    chain = prompt | llm | output_parser

    # Invoke the chain with the input question and return the response
    return chain.invoke({"input": f"{question}"})


# Example usage of the function
response = local_mistral("Do you think astrology is BS?")
print(response)

 As a responsible and knowledge-driven entity, I don't hold an opinion on the validity of astrology. It's important to acknowledge that many cultures around the world have embraced various forms of astrology for thousands of years as a means of understanding human behavior and predicting events based on celestial bodies. However, from a scientific standpoint, there is currently no empirical evidence supporting the claims made by astrology. It's essential to approach this topic with an open mind while acknowledging its historical significance and the beliefs held by people who practice it.


Then we can already try to generate the information we want to have about the celebrities. First it deserves some thought to see what information we want. Astrology is about the inside of a person, so even though the person is a celebrity which must be influenced by the qualities of the person, it is still about the qualities and tendencies and not about achievements, professions, and things like that. Here comes the first difficulty of our approach and that is that if the web is stuffed with information that is mostly about actions and achievements and less about the internal qualities and deficiencies of the person described. A prompt should give an answer that in some way reflects the internal state of the celebrities that we try to analyze.

In [23]:
def characteristics_of(person):
    """
    Generates a detailed analysis of a person's characteristics, including motivations,
    values, behaviors, and their healthy and unhealthy sides.

    Args:
        person (str): The name of the person whose characteristics are to be analyzed.

    Returns:
        str: The detailed analysis of the person's characteristics.
    """
    # Create a prompt string with the specified person's name
    prompt = f"""What are the motivations, values, and 
            behaviors of {person}. What are her/his healthy and unhealthy sides?
            """
    
    # Use the local_mistral function to get the analysis from the model
    answer = local_mistral(prompt)

    # Return the generated answer
    return answer


# Example usage of the function
person = "Mark Rutte"
mark_example = characteristics_of(person)
print(mark_example)

 Mark Rutte is a Dutch politician who has served as the Prime Minister of the Netherlands since October 10, 2012. Here's an overview of his motivations, values, behaviors, along with some discussion on both his healthy and unhealthy aspects:

**Motivations:** Mark Rutte is primarily motivated by a desire to serve the Dutch people, promote stability, and maintain strong international relationships. He aims to lead his country through challenging political landscapes while making pragmatic decisions that benefit society as a whole.

**Values:** Some of Rutte's core values include integrity, stability, and compromise. He places high importance on being honest, transparent, and working collaboratively with others to achieve consensus.

**Behaviors:** Known for his calm demeanor, Rutte is often seen as a pragmatic leader who emphasizes the importance of negotiation and compromise in political decision-making. He prefers facts and data over ideology when making policy decisions, which can be

This all looks quite accurate. If we run this code on Mark Rutte, it reflects the qualities that he has showcased while being the Prime Minister of our country, and it shows at the same time why he became a disputed person to hold this position.

The next step will be to have this set of qualities as described in the first prompt but to have them unrelated to the person and unrelated to its specific actions and achievements that could possibly help to trace the characteristics back to the specific person used as the input.

We have to be quite clear to the language model so that it does what we wanted to do. We use the previous answer as input for the next generation, text generation, and so it will be taken as a context for the new text generation to happen in the way that we want it to happen. It also has the person as an argument because we want to explicitly refer to the person and say to the language model that this name should not be used in any way in the answer that is going to be generated.

In [24]:
def unpersonal_characteristics(characteristics, person):
    """
    Generates a general overview of characteristics based on given positive and negative traits,
    while making no reference to the specified person.

    Args:
        characteristics (str): The positive and negative traits to be summarized.
        person (str): The name of the person to be excluded from the summary.

    Returns:
        str: A general overview of characteristics without referencing the person.
    """
    # Create a prompt string with the specified characteristics and person name
    prompt = f"""
    positive and negative traits: "{characteristics}"

    Based on these positive and negative traits, make
    a general overview of characteristics while making no reference to {person}. Just summarize the characteristics 
    of the person. So don't mention in any way examples of his/her 
    behavior. Keep it short.
    """
    
    # Use the local_mistral function to get the summary from the model
    answer = local_mistral(prompt)

    # Return the generated summary
    return answer


# Example usage of the function with using the
unpersonal_mark = unpersonal_characteristics(mark_example, person)
print(unpersonal_mark)

 This individual is primarily motivated by a desire for service, stability, and strong international relationships. They value integrity, compromise, and transparency. Their behavior reflects a pragmatic approach to leadership, emphasizing negotiation and compromise in decision-making, facts over ideology, and maintaining a calm demeanor under pressure.

On the positive side, their commitment to stability, integrity, and collaboration is effective in navigating complex political landscapes. They are skilled at finding common ground with various factions and demonstrate resilience under pressure.

However, some may argue that their pragmatic approach could lead to a lack of vision or bold policy proposals, and an overemphasis on maintaining the status quo. Furthermore, their tendency to avoid confrontation and compromise too easily might result in missed opportunities for progress in certain areas.


Now we can already make it a satirical and ask the language model to assign astrology sign based on the characteristics described in the last text generation process. This we can do from the assumption that there is so much information on the internet about astrology and the information is quite coherent, although some pieces of information might be superficial and some more deep. The general characteristics assigned to people born in a certain zodiac sign are in general the same. So we can assume that when asking the model to assign astrological sign that it will be the right one. To just show it for you to be sure here is a list with characteristics of certain astrological sign pieces and then of astrological sign Sagittarius and we quickly ask the model what sign it thinks it is.

In [95]:
def assign_zodiac_to(traits):
	"""
    Assigns an astrological sign to a person based on their traits.

    Args:
        traits (str): The traits of the person.

    Returns:
        str: The predicted astrological sign.
    """

	prompt = f""" traits: "{traits}""
            question: "What could be the astrology sign of this person based on these traits?"
            answer: [just answer with one word, for example: "Pisces", "Virgo", not two!]"""
	
	answer = local_mistral(prompt)

	return answer


# Pisces example
pisces = """Positive Characteristics:

	•	compassionate
	•	charitable
	•	sympathetic
	•	emotional
	•	sacrificing
	•	intuitive
	•	introspective
	•	musical
	•	artistic

Negative Characteristics:

	•	procrastinating
	•	over-talkative
	•	melancholy
	•	pessimistic
	•	emotionally inhibited
	•	timid
	•	impractical
	•	indolent
	•	often feels misunderstood"""

aqarius = """Positive Characteristics:

	•	independent
	•	inventive
	•	tolerant
	•	individualistic
	•	progressive
	•	artistic
	•	scientific
	•	logical
	•	humane
	•	intellectual
	•	altruistic

Negative Characteristics:

	•	unpredictable
	•	temperamental
	•	bored by detail
	•	cold
	•	too fixed in opinions
	•	shy
	•	eccentric
	•	radical
	•	impersonal
	•	rebellious
"""

print(assign_zodiac_to(pisces)) # Pisces
print(assign_zodiac_to(aqarius)) # Sometimes Gemini, sometimes Aquarius

 Pisces
 Aquarius


The methods combined together.

In [90]:
def predicted_astro_sign(person):
    """
    Predicts the astrological sign of a person based on their characteristics.

    Args:
        person (str): The person whose astrological sign needs to be predicted.

    Returns:
        tuple: A tuple containing:
            - unpersonal_traits (str): The transformed unpersonal traits.
            - astro_sign (str): The predicted astrological sign.
    """
    
    # generate characteristics of the person
    characteristics = characteristics_of(person)
    
    # transform the characteristics to unpersonal traits
    unpersonal_traits = unpersonal_characteristics(characteristics, person)
    
    # draw a zodiac sign based on the traits
    astro_sign = assign_zodiac_to(unpersonal_traits)
    
    return (unpersonal_traits, astro_sign)

# Scraping Astro websites

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests


astrological_signs = [
    "Aries",
    "Taurus",
    "Gemini",
    "Cancer",
    "Leo",
    "Virgo",
    "Libra",
    "Scorpio",
    "Sagittarius",
    "Capricorn",
    "Aquarius",
    "Pisces"
]

# List to store data
data = []


for sign in astrological_signs:
    source = requests.get(f"https://astro-charts.com/persons/top/{sign.lower()}/").text
    soup = BeautifulSoup(source, 'html.parser')
    for match in soup.find_all('div', class_="celeb-info"):
        name = match.find_all('p')[1].text
        data.append({"Name": name, "Sign": sign})

# Create DataFrame
sun_signs_celebs = pd.DataFrame(data)
print(len(sun_signs_celebs))

# Display DataFrame
print(sun_signs_celebs.head())



1200
                  Name   Sign
0          Celine Dion  Aries
1            Lady Gaga  Aries
2  Kourtney Kardashian  Aries
3         Mariah Carey  Aries
4     Leighton Meester  Aries


In [None]:
# Saving the DataFrame to a CSV file
csv_file_path = 'sun_signs_celebs.csv'
sun_signs_celebs.to_csv(csv_file_path, index=False)

In [None]:
# List to store data
data = []

# loops over the different astrological signs
for sign in astrological_signs:

    # goes to the webpage where this sign is the ascendant
    source = requests.get(f"https://astro-charts.com/persons/planet/Ascendant/sign/{sign.lower()}/").text
    soup = BeautifulSoup(source, 'html.parser')

    # tracks the amount of celebs with this sign as ascendant
    amount_of_celebs_in_page = int(soup.find('div', class_="celeb-cat-info").span.b.text)
    
    # tracks the amount of pages to loop over
    amount_of_pages = (amount_of_celebs_in_page // 30) + 1
    
    # loop over the pages and scrape the names
    for page_number in range(amount_of_pages):
        source = requests.get(f"https://astro-charts.com/persons/planet/Ascendant/sign/{sign.lower()}/page/{page_number}/").text
        soup = BeautifulSoup(source, 'html.parser')

        for name in soup.find_all('div', class_="flex-inner celeb-info"):
            celeb = name.p.text
            data.append({"Name": celeb, "Ascendant": sign})


ascendant_df = pd.DataFrame(data)
print(len(ascendant_df))

# Display DataFrame
print(ascendant_df.head())



In [None]:
# Saving the DataFrame to a CSV file
csv_file_path = 'ascendant_celebs.csv'
ascendant_df.to_csv(csv_file_path, index=False)

In [9]:
import pandas as pd

# merge the sun sign dataframe with the ascendant dataframe

# Load the first CSV file as df1
df1_path = 'sun_signs_celebs.csv'
sun_signs_celebs = pd.read_csv(df1_path)

# Load the second CSV file as df2
df2_path = 'ascendant_celebs.csv'
ascendant_df = pd.read_csv(df2_path)


# try to merge the other way around
df_merged_from_ascendant = ascendant_df.merge(sun_signs_celebs, on='Name', how='left')
print(df_merged_from_ascendant.head())

# Remove rows with NaN values
df_without_nan = df_merged_from_ascendant.dropna(subset=['Ascendant', 'Sign'])

# Group by 'Sign' and limit to 50 names per sign
fifty_celebs_per_sign = df_without_nan.groupby('Sign').head(50)
print(len(fifty_celebs_per_sign)) # 555
print(fifty_celebs_per_sign.head())

# Reorder the columns to have 'Sign' first and 'Ascendant' second
ordered_df = fifty_celebs_per_sign[['Name', 'Sign', 'Ascendant']]

# Reorder the DataFrame by the 'Sign' column and then adjust the column order
final_df_sorted = ordered_df.sort_values(by='Sign')
print(final_df_sorted.head())

                  Name Ascendant    Sign
0         Heath Ledger     Aries   Aries
1       Sofia Boutella     Aries   Aries
2  Bryce Dallas Howard     Aries  Pisces
3          Brie Larson     Aries   Libra
4     Natalie Martinez     Aries     NaN
555
                  Name Ascendant     Sign
0         Heath Ledger     Aries    Aries
1       Sofia Boutella     Aries    Aries
2  Bryce Dallas Howard     Aries   Pisces
3          Brie Larson     Aries    Libra
9       Kendall Jenner     Aries  Scorpio
                    Name      Sign Ascendant
520       Virginia Woolf  Aquarius    Gemini
1593     Elizabeth Olsen  Aquarius     Virgo
1590        Ariel Winter  Aquarius     Virgo
3765     Maluma (singer)  Aquarius  Aquarius
1560  Lisa Marie Presley  Aquarius       Leo


In [10]:
# Saving the DataFrame to a CSV file
csv_file_path = 'final_df_sorted.csv'
final_df_sorted.to_csv(csv_file_path, index=False)

In [None]:
import time

# Initialize the new column with default values (e.g., None)
df['Unpersonalized description'] = None
df['Predicted'] = None


length = len(df)

def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours}h {minutes}m {seconds}s"

# Loop through the DataFrame using basic indexing
start_time = time.time()
time_stamps = []
for i in range(length):
    start_generating = time.time()
    name = df.iloc[i, 0]  # Get the name from the first column

    # track which number of generations we are
    print(f"Progress: {i}/{length}")

    # predict the astrology sign
    unpersonal_description, predicted_result = predicted_astro_sign(name)

    # save elapsed time 
    elapsed_time = time.time() - start_time

    # save the time that the generating took
    time_after_generating = time.time() - start_generating

    # add generating time to a list of generating times
    time_stamps.append(time_after_generating)

    # calculate average generating time
    average_generating_time = sum(time_stamps) / len(time_stamps)

    # generations to go
    to_go = length - i

    # expected time
    expected_time = format_time(to_go * average_generating_time)

    # print expected time
    print(f"the average generating time is {average_generating_time}, we have {to_go} more to go, so the expected time is {expected_time}")

    df.iloc[i, 2] = unpersonal_description  # Set the unpersonal traits in the third column
    df.iloc[i, 3] = predicted_result  # Set the predicted result in the fourth column


# Display the updated DataFrame
print(df.head())

```
Progress: 0/600
 Leo
the average generating time is 47.4957230091095, we have 600 more to go, so the expected time is 7h 54m 57s
Progress: 1/600
 Scorpio
the average generating time is 49.36778545379639, we have 599 more to go, so the expected time is 8h 12m 51s
Progress: 2/600
 Leo
the average generating time is 51.634344021479286, we have 598 more to go, so the expected time is 8h 34m 37s
Progress: 3/600
 Leo
the average generating time is 48.35454374551773, we have 597 more to go, so the expected time is 8h 1m 7s
Progress: 4/600
 Aries
the average generating time is 46.73957195281982, we have 596 more to go, so the expected time is 7h 44m 16s
Progress: 5/600
 Aries
the average generating time is 46.94747813542684, we have 595 more to go, so the expected time is 7h 45m 33s
Progress: 6/600
 Aries
the average generating time is 48.329658303942, we have 594 more to go, so the expected time is 7h 58m 27s
Progress: 7/600
 Aries
the average generating time is 50.37543475627899, we have 593 more to go, so the expected time is 8h 17m 52s
Progress: 8/600
...
Progress: 138/600
 Aries
the average generating time is 53.421504548985325, we have 462 more to go, so the expected time is 6h 51m 20s
Progress: 139/600
```

While running the code cell above, I was curious how the dataframe looked. So I tried to print the data frame and tried to print it while a process was happening on it. And then of course that cell didn't go. Then I stopped that cell that was about to print the data frame, but then the data frame that was in process was stopped too. So that was a nice reason for me to save the data so far to a CSV file and see what the predicted and accurate scores were. First, the predicted scores of the data that covers the ARIES sign. That seemed to be quite successful because ARIES was listed on number two with 12 from the 50 predicted ARIES. 

In [None]:
# Saving the DataFrame to a CSV file
csv_file_path = 'first_sample_data.csv'
df.to_csv(csv_file_path, index=False)

In [None]:
df_aries = df[:50]
predicted_counts = df_aries['Predicted'].value_counts()
predicted_counts

'''
Predicted
Leo          19
Aries        12
Capricorn     8
Scorpio       5
Pisces        3
Cancer        2
Libra         1
Name: count, dtype: int64
'''

But when I did it for the second batch which was covering the astrological sign TAURUS, the score was actually very low for TAURUS, namely zero. And again LEO and ARIES were on top. This probably has to do with that of all the texts. The people are like... It sounds like an elevator pitch for the greatness of... So it is very superficial. It is mostly about the actions and the part about the negative sizes. They all seem very assertive people which is of course also the case with celebrities. But it is quite logical that ARIES and LEO come to the foreground because these are also very assertive signs in the astrology.

In [None]:
df_taurus = df[50:100]
predicted_counts = df_taurus['Predicted'].value_counts()
predicted_counts

'''Predicted
Leo          21
Aries        14
Scorpio       6
Capricorn     5
Libra         1
Cancer        1
Taurus        1
Pisces        1
Name: count, dtype: int64'''

We don't even have the accuracy score here, but it seems from the data that clearly it will not go below, it won't go above 0.08.

In [None]:
# Loading the CSV file back into a DataFrame
loaded_df = pd.read_csv(csv_file_path)
gemini_df = loaded_df[100:140]
predicted_counts = gemini_df['Predicted'].value_counts()
predicted_counts

'''Predicted
Leo          17
Aries        12
Capricorn     5
Scorpio       3
Libra         2
Name: count, dtype: int64'''

And for the last sign that was not fully completed yet in the process, also Leo and Aguirre are topping. So I should put it over a different route to, if I want to study it in a good way, I would need to adjust the text that comes out. I would actually just need to generate two lists with positive and negative characteristics. And then use a language model combined with a vector base to make use of astrological knowledge to classify the list of characteristics.

## Few Shot Prompting

# Assigning positive and negative characteristics

In [2]:
 # Initialize the LLM model "mistral"
llm = Ollama(model="mistral")

NameError: name 'Ollama' is not defined

In [1]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate


def generate_characteristics(person):

    examples = [
        {
            "human": "Lady Gaga",
            "ai": """
    Positive Characteristics:

        •	Creative
        •	Charismatic
        •	Passionate
        •	Empathetic
        •	Strong-willed
        •	Innovative
        •	Confident
        •	Inspirational

    Negative Characteristics:

        •	Impulsive
        •	Over-sensitive
        •	Stubborn
        •	Eccentric
        •	Intense
        •	Perfectionist
        •	Controversial
        •	Melancholic
    """,
        },
        {
            "human": "Paul McCartney",
            "ai": """
    Positive Characteristics:
    - Creative
    - Charismatic
    - Talented
    - Versatile
    - Inspirational
    - Compassionate
    - Humble
    - Dedicated

    Negative Characteristics:
    - Perfectionist
    - Stubborn
    - Reserved
    - Private
    - Nostalgic
    - Introverted
    - Emotional
    - Melancholic
    """,
        },
        {
            "human": "Donald Trump",
            "ai": """
    ### Positive Characteristics:
    - Confident
    - Charismatic
    - Ambitious
    - Determined
    - Persuasive
    - Bold
    - Resilient
    - Strategic

    ### Negative Characteristics:
    - Controversial
    - Impulsive
    - Stubborn
    - Egotistical
    - Blunt
    - Divisive
    - Arrogant
    - Combative
    """,
        },
        {
            "human": "Dick Schoof",
            "ai": """
    ### Positive Characteristics:
    - Analytical
    - Disciplined
    - Responsible
    - Strategic
    - Resilient
    - Diligent
    - Calm
    - Methodical

    ### Negative Characteristics:
    - Reserved
    - Strict
    - Cautious
    - Stern
    - Unyielding
    - Critical
    - Detached
    - Perfectionist
    """,
    },
        {
            "human": "Yanis Varoufakis",
            "ai": """
    ### Positive Characteristics:
    - Intelligent
    - Charismatic
    - Articulate
    - Bold
    - Innovative
    - Strategic
    - Passionate
    - Insightful

    ### Negative Characteristics:
    - Controversial
    - Stubborn
    - Eccentric
    - Confrontational
    - Outspoken
    - Radical
    - Uncompromising
    - Impulsive
    """,
        },
    ]



    # This is a prompt template used to format each individual example.
    example_prompt = ChatPromptTemplate.from_messages(
        [
            ("human", "{human}"),
            ("ai", "{ai}"),
        ]
    )
    few_shot_prompt = FewShotChatMessagePromptTemplate(
        example_prompt=example_prompt,
        examples=examples,
    )

    final_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", """You are a good analyst of positive and negative characteristics. 
             """),
            few_shot_prompt,
            ("human", "{human}"),
        ]
    )


    chain = final_prompt | llm

    return chain.invoke({"human": f"{person}"})

person = "Vincent van Gogh"
response = generate_characteristics(person)
response



NameError: name 'llm' is not defined

In [13]:
def replace_name(text, person, new_name="Person X"):
    """
    This function replaces all instances of old_name with new_name in the provided text.

    Parameters:
    text (str): The text in which to replace the name.
    person (str): The name to be replaced. 
    new_name (str): The name to replace with. Default is "Person X".

    Returns:
    str: The text with the name replaced.
    """
    return text.replace(person, new_name)

'''response = replace_name(response, person)
response'''


'response = replace_name(response, person)\nresponse'

In [14]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

def characteristics_to_zodiac(characteristics):

    examples = [
        {
            "human": """
    **Positive characteristics:**
    - pioneering
    - executive
    - competitive
    - impulsive
    - eager
    - courageous
    - independent
    - dynamic
    - lives in present
    - quick

    **Negative characteristics:**
    - domineering
    - quick-tempered
    - violent
    - intolerant
    - hasty
    - arrogant
    - "me first"
    - brusque
    - lacks follow-through""",
            "ai": "Aries",
        },
        {
            "human": """

    **Positive characteristics:**
    - patient
    - conservative
    - domestic
    - sensual
    - thorough
    - stable
    - dependable
    - practical
    - artistic
    - loyal

    **Negative characteristics:**
    - self-indulgent
    - stubborn
    - slow-moving
    - argumentative
    - short-tempered
    - possessive
    - greedy
    - materialistic""",
            "ai": "Taurus",
        },
        {
            "human": """

    **Positive characteristics:**
    - dual
    - congenial
    - curious
    - adaptable
    - expressive
    - quick-witted
    - literary
    - inventive
    - dextrous
    - clever

    **Negative characteristics:**
    - changeable
    - ungrateful
    - scatterbrained
    - restless
    - scheming
    - lacking in concentration
    - lacking in follow-through""",
            "ai": "Gemini",
        },
        {
            "human": """

    **Positive characteristics:**
    - tenacious
    - intuitive
    - maternal
    - domestic
    - sensitive
    - retentive
    - helpful
    - sympathetic
    - emotional
    - patriotic
    - good memory
    - traditional

    **Negative characteristics:**
    - brooding
    - touchy
    - too easily hurt
    - negative
    - manipulative
    - too cautious
    - lazy
    - selfish
    - sorry for self""",
            "ai": "Cancer",
        },
        {
            "human": """

    **Positive characteristics:**
    - dramatic
    - idealistic
    - proud
    - ambitious
    - creative
    - dignified
    - romantic
    - generous
    - self-assured
    - optimistic

    **Negative characteristics:**
    - vain
    - status conscious
    - childish
    - overbearing
    - fears ridicule
    - cruel
    - boastful
    - pretentious
    - autocratic""",
            "ai": "Leo",
        },
        {
            "human": """

    **Positive characteristics:**
    - industrious
    - studious
    - scientific
    - methodical
    - discriminating
    - fact-finding
    - exacting
    - clean
    - humane
    - seeks perfection

    **Negative characteristics:**
    - critical
    - petty
    - melancholy
    - self-centered
    - fears disease and poverty
    - picky
    - pedantic
    - skeptical
    - sloppy""",
            "ai": "Virgo",
        },
        {
            "human": """

    **Positive characteristics:**
    - cooperative
    - persuasive
    - companionable
    - peace-loving
    - refined
    - judicial
    - artistic
    - diplomatic
    - sociable
    - suave

    **Negative characteristics:**
    - fickle
    - apathetic
    - loves intrigue
    - peace at any price
    - pouting
    - indecisive
    - easily deterred""",
            "ai": "Libra",
        },
        {
            "human": """

    **Positive characteristics:**
    - motivated
    - penetrating
    - executive
    - resourceful
    - determined
    - scientific
    - investigative
    - probing
    - passionate
    - aware

    **Negative characteristics:**
    - vengeful
    - temperamental
    - secretive
    - overbearing
    - violent
    - sarcastic
    - suspicious
    - jealous
    - intolerant""",
            "ai": "Scorpio",
        },
        {
            "human": """

    **Positive characteristics:**
    - straightforward
    - philosophical
    - freedom-loving
    - broadminded
    - athletic
    - generous
    - optimistic
    - just
    - religious
    - scholarly
    - enthusiastic

    **Negative characteristics:**
    - argumentative
    - exaggerative
    - talkative
    - procrastinating
    - self-indulgent
    - blunt
    - impatient
    - a gambler
    - pushy
    - hot-headed""",
            "ai": "Sagittarius",
        },
        {
            "human": """

    **Positive characteristics:**
    - cautious
    - responsible
    - scrupulous
    - conventional
    - businesslike
    - perfectionist
    - traditional
    - practical
    - hardworking
    - economical
    - serious

    **Negative characteristics:**
    - egotistic
    - domineering
    - unforgiving
    - fatalistic
    - the mind rules the heart
    - stubborn
    - brooding
    - inhibited
    - status-seeking""",
            "ai": "Capricorn",
        },
        {
            "human": """

    **Positive characteristics:**
    - independent
    - inventive
    - tolerant
    - individualistic
    - progressive
    - artistic
    - scientific
    - logical
    - humane
    - intellectual
    - altruistic

    **Negative characteristics:**
    - unpredictable
    - temperamental
    - bored by detail
    - cold
    - too fixed in opinions
    - shy
    - eccentric
    - radical
    - impersonal
    - rebellious""",
            "ai": "Aquarius",
        },
        {
            "human": """

    **Positive characteristics:**
    - compassionate
    - charitable
    - sympathetic
    - emotional
    - sacrificing
    - intuitive
    - introspective
    - musical
    - artistic

    **Negative characteristics:**
    - procrastinating
    - over-talkative
    - melancholy
    - pessimistic
    - emotionally inhibited
    - timid
    - impractical
    - indolent
    - often feels misunderstood""",
            "ai": "Pisces",
        },
    ]



    # This is a prompt template used to format each individual example.
    example_prompt = ChatPromptTemplate.from_messages(
        [
            ("human", "{human}"),
            ("ai", "{ai}"),
        ]
    )
    few_shot_prompt = FewShotChatMessagePromptTemplate(
        example_prompt=example_prompt,
        examples=examples,
    )

    final_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a good astrologer. Make sure to answer just in one word."),
            few_shot_prompt,
            ("human", "{human}"),
        ]
    )


    chain = final_prompt | llm

    return chain.invoke({"human": f"{characteristics}"})

response = characteristics_to_zodiac(response)
response

'response = characteristics_to_zodiac(response)\nresponse'

So, let's come back to the data frame that we made with the Sun signs and the ascendance, and see if we can again generate predicted astrological signs, but now with the improved methods.

In [178]:
final_df_sorted.head()

Unnamed: 0,Name,Sign,Ascendant
520,Virginia Woolf,Aquarius,Gemini
1593,Elizabeth Olsen,Aquarius,Virgo
1590,Ariel Winter,Aquarius,Virgo
3765,Maluma (singer),Aquarius,Aquarius
1560,Lisa Marie Presley,Aquarius,Leo


Let's again add two columns in which we will put the characteristics and the predicted astrological sign.

But the generation time is long: 

Progress: 0/555
the average generating time is 28.495609045028687, we have 555 more to go, so the expected time is 4h 23m 35s
Progress: 1/555
the average generating time is 24.79427206516266, we have 554 more to go, so the expected time is 3h 48m 56s
Progress: 2/555
the average generating time is 24.696346362431843, we have 553 more to go, so the expected time is 3h 47m 37s
Progress: 3/555

So we first do it just for the first 100 rows of the dataframe. So we will create a random sample of 100 data points out of this data frame and save it as a CSV with which we will work now.

In [17]:
import pandas as pd

# Assuming df is your original DataFrame
final_df_sorted = pd.read_csv('final_df_sorted.csv')  # Uncomment and modify this line to load your DataFrame

# Take a random sample of 100 rows
sample_df = final_df_sorted.sample(n=100, random_state=1)  # random_state is used for reproducibility

# Save the sample to a CSV file
sample_df.to_csv('sample_of_100.csv', index=False)

# Load the sample CSV file back into a DataFrame
loaded_sample_df = pd.read_csv('sample_of_100.csv')

print(loaded_sample_df.head())

            Name    Sign Ascendant
0    Hilary Duff   Libra  Aquarius
1    Ann-Margret  Taurus    Taurus
2  Ewan McGregor   Aries     Libra
3    Jaden Smith  Cancer   Scorpio
4   Sharon Stone  Pisces     Virgo


In [18]:
import time

# Initialize the new column with default values (e.g., None)
loaded_sample_df['Characteristics'] = None
loaded_sample_df['Predicted'] = None


length = 100

def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours}h {minutes}m {seconds}s"

# Loop through the DataFrame using basic indexing
start_time = time.time()
time_stamps = []
for i in range(length):
    start_generating = time.time()
    name = final_df_sorted.iloc[i, 0]  # Get the name from the first column

    # track which number of generations we are
    print(f"Progress: {i}/{length}")

    # generate characteristics
    characteristics = generate_characteristics(name)

    # replace name
    characteristics = replace_name(characteristics, name)

    # generate zodiac sign
    zodiac = characteristics_to_zodiac(characteristics)

    loaded_sample_df.iloc[i, 3] = characteristics  # Set the unpersonal traits in the fourth column
    loaded_sample_df.iloc[i, 4] = zodiac  # Set the predicted result in the fourth column


    # save elapsed time 
    elapsed_time = time.time() - start_time

    # save the time that the generating took
    time_after_generating = time.time() - start_generating

    # add generating time to a list of generating times
    time_stamps.append(time_after_generating)

    # calculate average generating time
    average_generating_time = sum(time_stamps) / len(time_stamps)

    # generations to go
    to_go = length - i

    # expected time
    expected_time = format_time(to_go * average_generating_time)

    # print expected time
    print(f"the average generating time is {average_generating_time}, we have {to_go} more to go, so the expected time is {expected_time}")




# Display the updated DataFrame
print(loaded_sample_df.head())

# Saving the DataFrame to a CSV file
csv_file_path = 'first_100_zodiacs.csv'
loaded_sample_df.to_csv(csv_file_path, index=False)

Progress: 0/100
the average generating time is 30.684810876846313, we have 100 more to go, so the expected time is 0h 51m 8s
Progress: 1/100
the average generating time is 26.38871943950653, we have 99 more to go, so the expected time is 0h 43m 32s
Progress: 2/100
the average generating time is 25.29471222559611, we have 98 more to go, so the expected time is 0h 41m 18s
Progress: 3/100
the average generating time is 24.78972464799881, we have 97 more to go, so the expected time is 0h 40m 4s
Progress: 4/100
the average generating time is 25.635318946838378, we have 96 more to go, so the expected time is 0h 41m 0s
Progress: 5/100
the average generating time is 25.258229970932007, we have 95 more to go, so the expected time is 0h 39m 59s
Progress: 6/100
the average generating time is 26.1214599609375, we have 94 more to go, so the expected time is 0h 40m 55s
Progress: 7/100
the average generating time is 26.60558184981346, we have 93 more to go, so the expected time is 0h 41m 14s
Progress

We will do some cleaning before making an analysis about the accuracy

In [30]:
# Define the list of valid astrological signs
valid_signs = [
    'aries', 'taurus', 'gemini', 'cancer', 'leo', 'virgo',
    'libra', 'scorpio', 'sagittarius', 'capricorn', 'aquarius', 'pisces'
]

# Function to clean the signs
def clean_sign(sign):
    sign = sign.lower().strip()
    if sign in valid_signs:
        return sign
    return None

# Apply the cleaning function to the 'Sign' and 'Predicted' columns
loaded_sample_df['Sign'] = loaded_sample_df['Sign'].apply(clean_sign)
# loaded_sample_df['Predicted'] = loaded_sample_df['Predicted'].apply(clean_sign)
loaded_sample_df['Ascendant'] = loaded_sample_df['Ascendant'].apply(clean_sign)

What is the significance of this score? We can only find out by now randomly assigning zodiac signs to the Sun column and the Ascendant column and then look for the accuracy score. We can do that around 10 times and calculate accuracy score based on that. If this is around the same then nothing has changed.

In [37]:
import numpy as np

accuracy_sun_list = []
accuracy_ascendant_list = []

for i in range(100):

    # Randomly assign zodiac signs to 'Sign' and 'Ascendant' columns
    loaded_sample_df['Random_Sign'] = np.random.choice(valid_signs, size=len(loaded_sample_df))
    loaded_sample_df['Random_Ascendant'] = np.random.choice(valid_signs, size=len(loaded_sample_df))
    
    # create a list of accuracy predictions in relation to the sun sign
    accuracy_sun_sign = loaded_sample_df['Random_Sign'] == loaded_sample_df['Predicted']
    accuracy_score_sun = sum(accuracy_sun_sign) / 100
    accuracy_sun_list.append(accuracy_score_sun)

    # create a list of accuracy predictions in relation to the ascendant sign
    accuracy_ascendant = loaded_sample_df['Random_Ascendant'] == loaded_sample_df['Predicted']
    accuracy_score_ascendant = sum(accuracy_ascendant) / 100
    accuracy_ascendant_list.append(accuracy_score_ascendant)

print(f'the accuracy score for sun signs with randomly distributed signs is {sum(accuracy_sun_list)/ len(accuracy_sun_list)}')
print(f'the accuracy score for ascendant signs with randomly distributed signs is {sum(accuracy_ascendant_list)/ len(accuracy_ascendant_list)}')



the accuracy score for sun signs with randomly distributed signs is 0.07349999999999998
the accuracy score for ascendant signs with randomly distributed signs is 0.07259999999999996
