<a href="https://colab.research.google.com/github/srenna/moonshot_finder/blob/main/moonshot_sr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
import pandas as pd

In [6]:
encodings_to_try = ['utf-8', 'latin-1', 'ISO-8859-1']

for encoding in encodings_to_try:
    try:
        df = pd.read_csv('AI EarthHack Dataset.csv', encoding=encoding)
        break
    except UnicodeDecodeError:
        continue

In [7]:
# check all the data is there
len(df)

# look for null values
df.isnull().sum()

# drop null values
df = df.dropna()


In [8]:
df.head()

Unnamed: 0,id,problem,solution
0,1,The construction industry is indubitably one o...,"Herein, we propose an innovative approach to m..."
1,2,"I'm sure you, like me, are feeling the heat - ...","Imagine standing on a green hill, not a single..."
2,3,The massive shift in student learning towards ...,"Implement a """"Book Swap"""" program within educa..."
3,4,The fashion industry is one of the top contrib...,The proposed solution is a garment rental serv...
4,5,The majority of the materials used in producin...,An innovative concept would be a modular elect...


In [9]:
# use chain of thought prompting
# feed the problem first
test_prob = df['problem'][0]
test_sol = df['solution'][0]

print(test_prob)

The construction industry is indubitably one of the significant contributors to global waste, contributing approximately 1.3 billion tons of waste annually, exerting significant pressure on our landfills and natural resources. Traditional construction methods entail single-use designs that require frequent demolitions, leading to resource depletion and wastage.   


## Falcon 7B Model - by TII
---
1. Model Name: Falcon-7b-instruct
2. Model Parameters: 7 Billion
3. Training: Instruction-tuned Model
4. Link: https://huggingface.co/tiiuae/falcon-7b-instruct
---

In [10]:
# install dependencies
!pip install transformers
!pip install einops
!pip install accelerate

Collecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m751.7 kB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.7.0
Collecting accelerate
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.25.0


In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

# load model
model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)

falcon_pipeline = transformers.pipeline("text-generation",
                                        model=model,
                                        tokenizer=tokenizer,
                                        torch_dtype=torch.bfloat16,
                                        trust_remote_code=True,
                                        device_map="auto"
                                        )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

configuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

In [15]:
# define completion function
def get_completion_falcon(input):
  system = f"""
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business opportunities that were crowdsourced from an exciting innovation contest.
  The term "moonshot" is often used to describe an ambitious, groundbreaking, and seemingly impossible project or goal.
  The term "incremental" will be use to describe less ambitious, more feasible ideas.
  Identify each idea as 'moonshot' or 'incremental'.
  """
  prompt = f"#### System: {system}\n#### User: \n{input}\n\n#### Response from falcon-7b-instruct:"
  print(prompt)
  falcon_response = falcon_pipeline(prompt,
                                    max_length=500,
                                    do_sample=True,
                                    top_k=10,
                                    num_return_sequences=1,
                                    eos_token_id=tokenizer.eos_token_id,
                                    )

  return falcon_response

In [16]:
# let's prompt
# prompt = "Explain to me the difference between nuclear fission and fusion."
# prompt = "Why is the Sky blue?"
prompt = test_sol
response = get_completion_falcon(prompt)
print(response[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


#### System: 
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business opportunities that were crowdsourced from an exciting innovation contest.
  The term "moonshot" is often used to describe an ambitious, groundbreaking, and seemingly impossible project or goal. 
  The term "increment" will be use to describe less ambitious, more feasible ideas.
  Identify each idea as 'moonshot' or 'incremental'. 
  
#### User: 
Herein, we propose an innovative approach to mitigate this problem: Modular Construction. This method embraces recycling and reuse, taking a significant stride towards a circular economy.   Modular construction involves utilizing engineered components in a manufacturing facility that are later assembled on-site. These components are designed for easy disassembling, enabling them to be reused in diverse projects, thus significantly reducing waste and conserving resources

In [19]:
# let's prompt
# prompt = "Explain to me the difference between nuclear fission and fusion."
# prompt = "Why is the Sky blue?"
prompt = '''Herein, we propose an innovative approach to mitigate this problem: Modular Construction.
This method embraces recycling and reuse, taking a significant stride towards a circular economy.
  Modular construction involves utilizing engineered components in a manufacturing facility that are
  later assembled on-site. These components are designed for easy disassembling, enabling them to be reused
  in diverse projects, thus significantly reducing waste and conserving resources.
  Not only does this method decrease construction waste by up to 90%, but it also decreases
  construction time by 30-50%, optimizing both environmental and financial efficiency.
  This reduction in time corresponds to substantial financial savings for businesses.
  Moreover, the modular approach allows greater flexibility, adapting to changing needs over time.
  We believe, by adopting modular construction, the industry can transit from a 'take, make and dispose'
  model to a more sustainable 'reduce, reuse, and recycle' model, driving the industry towards a more circular
  and sustainable future. The feasibility of this concept is already being proven in markets around the globe,
  indicating its potential for scalability and real-world application. Is this a moonshot idea or incremental?'''
response = get_completion_falcon(prompt)
print(response[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


#### System: 
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business opportunities that were crowdsourced from an exciting innovation contest.
  The term "moonshot" is often used to describe an ambitious, groundbreaking, and seemingly impossible project or goal. 
  The term "increment" will be use to describe less ambitious, more feasible ideas.
  Identify each idea as 'moonshot' or 'incremental'. 
  
#### User: 
Herein, we propose an innovative approach to mitigate this problem: Modular Construction. 
This method embraces recycling and reuse, taking a significant stride towards a circular economy. 
  Modular construction involves utilizing engineered components in a manufacturing facility that are 
  later assembled on-site. These components are designed for easy disassembling, enabling them to be reused 
  in diverse projects, thus significantly reducing waste and conserving r

In [20]:
prompt = '''Imagine standing on a green hill, not a single towering, noisy windmill in sight, and yet,
you're surrounded by wind power generation! Using existing, yet under-utilized technology,
I propose a revolutionary approach to harness wind energy on a commercial scale, without those
""monstrously large and environmentally damaging windmills"".
With my idea, we could start construction tomorrow and give our electrical grid the jolt it needs,
creating a future where clean, quiet and efficient energy isn't a dream, but a reality we live in.
This is not about every home being a power station, but about businesses driving a green revolution
from the ground up! Is this a moonshot idea or incremental? Is this idea moonshot or incremental?'''
response = get_completion_falcon(prompt)
print(response[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


#### System: 
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business opportunities that were crowdsourced from an exciting innovation contest.
  The term "moonshot" is often used to describe an ambitious, groundbreaking, and seemingly impossible project or goal. 
  The term "increment" will be use to describe less ambitious, more feasible ideas.
  Identify each idea as 'moonshot' or 'incremental'. 
  
#### User: 
Imagine standing on a green hill, not a single towering, noisy windmill in sight, and yet, 
you're surrounded by wind power generation! Using existing, yet under-utilized technology, 
I propose a revolutionary approach to harness wind energy on a commercial scale, without those 
""monstrously large and environmentally damaging windmills"". 
With my idea, we could start construction tomorrow and give our electrical grid the jolt it needs, 
creating a future where clean, 

In [21]:
prompt = '''Implement a ""Book Swap"" program within educational institutions and local communities.
This platform allows students to trade books they no longer need with others who require them,
reducing the need for new book production and hence, lowering the rate of resource depletion.
Furthermore, the platform could have a digital component to track book exchanges, giving users credits
for each trade, which they can accrue and redeem. This system encourages and amplifies the benefits of
reusing and sharing resources, thus contributing to the circular economy.   By integrating gamification,
getting students and parents involved and providing an easy-to-use platform, the program could influence
a cultural shift towards greater resource value appreciation and waste reduction.
In terms of the financial aspect, less reliance on purchasing new books could save money for students,
parents and schools. Is this idea moonshot or incremental?'''
response = get_completion_falcon(prompt)
print(response[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


#### System: 
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business opportunities that were crowdsourced from an exciting innovation contest.
  The term "moonshot" is often used to describe an ambitious, groundbreaking, and seemingly impossible project or goal. 
  The term "increment" will be use to describe less ambitious, more feasible ideas.
  Identify each idea as 'moonshot' or 'incremental'. 
  
#### User: 
Implement a ""Book Swap"" program within educational institutions and local communities. This platform allows students to trade books they no longer need with others who require them, reducing the need for new book production and hence, lowering the rate of resource depletion. Furthermore, the platform could have a digital component to track book exchanges, giving users credits for each trade, which they can accrue and redeem. This system encourages and amplifies the be

In [25]:
prompt = '''This is a solution to a problem."Companies can offer products as a service,
where customers pay for access or usage rather than ownership.
This can be done through subscription models or pay-per-use systems."
Is this idea 'moonshot' or 'incremental'?'''
response = get_completion_falcon(prompt)
print(response[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


#### System: 
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business opportunities that were crowdsourced from an exciting innovation contest.
  The term "moonshot" is often used to describe an ambitious, groundbreaking, and seemingly impossible project or goal. 
  The term "increment" will be use to describe less ambitious, more feasible ideas.
  Identify each idea as 'moonshot' or 'incremental'. 
  
#### User: 
This is a solution to a problem."Companies can offer products as a service, 
where customers pay for access or usage rather than ownership. 
This can be done through subscription models or pay-per-use systems." 
Is this idea 'moonshot' or 'incremental'?

#### Response from falcon-7b-instruct:
#### System: 
  You are an expert venture capital (VC) expert.
  You are good at looking at a stack of potential startup pitches to evaluate innovative circular economy business op