In [9]:
from utils.download_pdfs_from_url import download_papers
from utils import icml_parser 
from pathlib import Path
import pickle
from tqdm import tqdm
from IPython.display import Markdown

## Project Parameters

In [2]:
PROJECT_DIR = './examples/icml_2024'

paper_pdf_dir = Path(PROJECT_DIR, 'paper_pdfs')
paper_summaries = Path(PROJECT_DIR, 'paper_summaries')
paper_parsed_dir = Path(PROJECT_DIR, 'paper_parsed.pkl')
paper_summaries_pkl = Path(PROJECT_DIR, 'paper_summaries.pkl') #{'overview':<>, 'best':[{'tag':<>. 'summary':<>}, ], 'all': []}
paper_overview = Path(PROJECT_DIR, 'paper_overview.pkl')
best_paper_ids_pkl = Path(PROJECT_DIR, 'best_paper_ids.pkl')
paper_mp3 = Path(PROJECT_DIR, 'mp3') #overview.mp3, <tag>.mp3

## Download and Parse all ICML papers

In [3]:
#_ = download_papers('https://proceedings.mlr.press/v235/', paper_pdf_dir)

#all_papers = icml_parser.parse_folder(paper_pdf_dir)
#with open(paper_parsed_dir, 'wb') as f:
#    pickle.dump(all_papers, f)

## Load ICML papers fomr pkl

In [3]:
with open(paper_parsed_dir, 'rb') as f:
    all_papers = pickle.load(f)

len(all_papers)

2610

## Generate Paper Summaries

In [8]:
'''
# Google AI Studio parameters 
import time
from utils.paper_ontology import *
import os
from IPython.display import Markdown

genai.configure(api_key="<>")
flash = genai.GenerativeModel('gemini-1.5-flash')
paper_summaries.mkdir(exist_ok=True, parents=True)

for i,p in tqdm(enumerate([all_papers[2413:]])):
    try:
        po = PaperOntology(p, flash)
        ps = po.create_summary()
        fn = Path(paper_summaries, 'ps_'+str(i+2413)+'.pkl')
        with open(fn, 'wb') as f:
            pickle.dump(ps, f)
    except:
        # This handling will miss one paper in the process
        print('deadline')
        genai.configure(api_key="<>")
        flash = genai.GenerativeModel('gemini-1.5-flash')
'''

1it [00:07,  7.13s/it]


## Generate Overview

In [4]:
summaries = []
for i in range(len(all_papers)):
    fn = Path(paper_summaries, 'ps_'+str(i)+'.pkl')
    with open(fn, 'rb') as f:
        s = pickle.load(f)
    summaries.append(s)

In [5]:
len(summaries)

2610

In [30]:
with open(paper_summaries_pkl, 'wb') as f:
    pickle.dump(summaries, f)

In [21]:
import google.generativeai as genai
from utils.paper_ontology import create_overview

genai.configure(api_key="<>")
flash = genai.GenerativeModel('gemini-1.5-flash')
o_1 = create_overview(summaries[:700], flash)
Markdown(o_1)

Hello everyone!  Today, we're going to explore some super cool things computer scientists are working on.  It's all about making computers smarter and more helpful!  I'll explain some of the amazing research presented at a recent conference.

**Section 1:  What Computers Can Do Now (Major Use Cases)**

Imagine computers that can design better medicines, predict the weather more accurately, or even help you find exactly what you’re looking for in a huge online store! These are some of the things computer scientists are making possible.  At this conference, researchers showed us how computers are getting better at:

* **Designing new materials:**  Computers can now help predict the shape and properties of tiny materials (nanomaterials), leading to better batteries, medicines, and even stronger buildings.
* **Optimizing business processes:**  Computers are learning to make better decisions all on their own (Reinforcement Learning), helping businesses allocate resources efficiently, improve their supply chains, and personalize marketing campaigns.
* **Understanding images and text:**  Computers can now “understand” both pictures and words, allowing for better image searches, automated descriptions of images, and improved content moderation.
* **Making accurate predictions:**  Computers are getting better at predicting future trends (like sales or stock prices) by analyzing time series data. They’re also making great strides in predicting the properties of molecules, essential for drug discovery.
* **Generating realistic data:**  Computers can create synthetic data—artificial data that mimics real data—including images, videos, text, and sounds. This is helpful for testing, improving existing models, and protecting privacy.
* **Automating complex tasks:**   Computers can use AI to write code, perform various actions on web pages, or even act as virtual assistants that automate many tasks you do every day.
* **Improving health:**   AI is analyzing medical data like EEGs and CT scans to detect illnesses earlier and with greater accuracy, leading to better treatment and improved patient care.
* **Creating 3D models:** Computers are learning to create realistic 3D models of moving objects from videos, which has implications for everything from gaming to robotics to medical imaging.
* **Protecting your data:**  Computer scientists are creating new ways to analyze sensitive data while maintaining strong privacy guarantees using techniques like differential privacy.


**Section 2:  The Problems Computers Are Solving**

These amazing capabilities solve some serious problems:

* **Slow and expensive material development:**  Creating new nanomaterials is currently a slow and expensive process. AI can speed things up and reduce costs.
* **Inefficient decision-making:**  Businesses often struggle to make the best decisions under uncertainty.  AI can help by learning optimal strategies through trial and error.
* **Difficulty in processing large images and text:** Analyzing very large amounts of data is often difficult for computers.  Improved methods allow for efficient processing.
* **Inaccurate predictions:**  Existing forecasting models might be imprecise, leading to poor planning and resource allocation.  AI is improving these predictions.
* **Lack of realistic synthetic data:** Generating realistic simulated environments or data is computationally expensive. Advanced AI is making this more affordable and achievable.
* **Difficulty in automating complex tasks:** Automating things like answering complex questions from data or interacting with web pages is challenging.  New AI models are getting significantly better at doing this.
* **Compromised data privacy:** Many business decisions rely on sensitive customer data.   Improved AI methods enhance privacy protection during model training.
* **Slow and costly model training:** Training large AI models can take a very long time and require lots of computational resources, adding to business costs. This is also a problem when trying to adapt to new tasks, or handle situations where you don't have access to all the model's components ("black-box").  Improved training methods can improve this considerably.
* **Poorly calibrated models (overconfidence):** Highly accurate models can sometimes be overconfident in their predictions, leading to risky decisions.  New methods are improving model reliability.


**Section 3: How Computers Are Solving These Problems**

Computer scientists use many clever techniques:

* **Artificial Intelligence (AI):**  This is the overall field of making computers smarter, encompassing the techniques below.
* **Machine Learning (ML):**  This is like teaching a computer to learn from examples, without being explicitly programmed.
* **Deep Learning:** This is a more advanced type of machine learning, typically involving complex interconnected systems called neural networks.  These neural networks have a similar concept to the structure of the human brain.
* **Reinforcement Learning:** This is like teaching a computer through trial and error using "rewards" for correct answers.
* **Neural Networks:**  These are the mathematical models that act like the "brains" of AI systems.  They are inspired by the network of neurons in the human brain.
* **Graph Neural Networks:**  Specifically for handling data that consists of networks of relationships (like customers and products).
* **Transformers:**  An extremely efficient and powerful type of neural network used for tasks like language processing and image understanding.
* **Diffusion Models:**  A powerful technique for generating images or other data by gradually adding and removing noise from a signal.
* **Optimal Transport:** A mathematical framework for comparing and transforming data distributions.  It is used to understand similarities and differences between datasets.
* **Bayesian Methods:** A way of using probabilistic modeling to quantify and handle uncertainty, making it applicable to situations with limited or incomplete data.  This enables incorporating existing knowledge ("prior" information).
* **Genetic Programming and Evolutionary Algorithms:** These strategies automate the process of improving models, similar to how evolution works.
* **Various Optimization Algorithms:**  Computer scientists use sophisticated math to find the best settings for AI models, including gradient descent, Adam, and other more advanced methods.  These methods are designed to find solutions to complex mathematical problems efficiently.
* **Various statistical techniques:**  Including maximum likelihood estimation, hypothesis testing, and Bayesian inference, depending on the specific problem.
* **Cryptographic methods:** Used to protect sensitive data during analysis using methods like differential privacy and secure multi-party computation.  This ensures the privacy of each individual data contribution.



**Section 4:  The Future of Computer Science**

There's so much more to discover!  Researchers are working on:

* **Making AI more efficient and sustainable:**  Reducing the energy needed to run AI models and reducing their environmental impact.
* **Building more trustworthy and reliable AI:**  Developing methods to ensure AI systems are less prone to errors and biases.
* **Improving AI explainability:**  Making it easier to understand how AI models make their decisions.
* **Developing AI that can learn and adapt continuously:**  Creating AI systems that can learn from new data without "forgetting" what they already know.
* **Expanding AI capabilities:**  Improving the way AI processes images, videos, sound, and other types of data.
* **Creating better tools for automating various tasks:**   This includes developing AI models for tasks like code generation, web automation, and even acting as virtual assistants.
* **Creating better methods for combining various types of data:** This includes methods to combine image and text data for advanced analysis, or combining various time-series data that may be collected at different frequencies (daily, monthly, quarterly).
* **Improving methods for handling large datasets and optimizing AI model performance**:  This involves making models work efficiently with less data and improving the speed and stability of the training process.  It also involves handling "noisy" data (data with errors) and making models less sensitive to these errors.


Computer science is a field full of exciting possibilities!  I hope this overview has sparked your curiosity and maybe even inspired you to learn more.


In [22]:
o_2 = create_overview(summaries[700:1400], flash)
Markdown(o_2)

Hello everyone!  Today, we're going on an exciting adventure into the world of computer science!  We'll explore some of the cool things computer scientists are working on, based on papers from a recent conference.  Think of it like peeking into the future of technology!

**Section 1: Amazing Things Computers Can Do (Use Cases)**

Imagine a world where computers can:

* **See like we do:**  They can look at pictures and understand what’s in them, even if it’s a blurry photo or something they've never seen before!  This is used for things like identifying products in a store, finding defects in manufacturing, or even helping doctors diagnose diseases from medical scans.
* **Understand our language:**  Computers can read and write like humans, understanding the meaning of what we say or type and answering our questions, even very tricky ones. This is used for chatbots, search engines, and writing tools.
* **Make smart decisions:** Computers can learn to make the best choices in complex situations, such as optimizing delivery routes, deciding which ads to show, or even controlling robots!
* **Create new things:** They can generate realistic pictures, music, or even designs for new products!  This is used for creating advertisements, making video games, or even designing drugs.
* **Protect our secrets:**  Computers can help us keep our private information safe when we use AI systems that need access to our data.  We’ll use this to protect medical records, financial information, and other sensitive details.
* **Understand cause and effect:**  Computers can figure out which factors influence other things, even when some information is missing!   This can help businesses figure out what drives customer buying habits, or which factors lead to delays in production.



**Section 2:  Solving Big Problems**

Computer scientists are working hard to solve many challenging problems:

* **Making AI faster and cheaper:**  Training and running AI models can be incredibly expensive and time-consuming. They're finding ways to build faster models that use less energy and need less storage space.
* **Making AI more reliable:**  Sometimes, AI makes mistakes or isn't accurate in all situations.  They're working on making AI systems less prone to error, including when deliberately tricked ("adversarial attacks").
* **Making AI safer:**  They're ensuring AI systems don\'t cause harm or are used for harmful activities (like making misleading images or spreading false news).
* **Making AI easier to understand:**  Often, it’s hard to understand *why* an AI makes a certain decision.  They're building systems that clearly explain their reasoning.
* **Making AI fairer:**  AI systems need to treat everyone fairly, regardless of race, gender, or other factors.  They're working to eliminate bias and ensure equity in AI systems.
* **Handling missing data:** Real-world data is often messy and incomplete. They’re finding ways to use AI to understand and process data with missing pieces, creating more accurate predictions.
* **Understanding complex relationships in data:** AI needs to efficiently analyze relationships between data points, like those found in social networks, supply chains, or customer interactions.


**Section 3: How They Solve These Problems**

Computer scientists are using many creative tools:

* **Neural Networks:** Imagine a network of interconnected nodes, much like the brain!  These are used for almost all the AI tasks above.
* **Transformers:** These are particularly powerful neural networks capable of understanding and processing sequences of data, like text or images.
* **Reinforcement Learning:**  Teaching AI systems to learn through trial and error, much like how we learn!  They give AI small rewards for good decisions and small penalties for bad ones.
* **Generative Models:** These AI systems can create new data (images, videos, text, etc.) similar to what they've seen before.  It's like having a digital artist that can create unique but realistic images.
* **Optimal Transport:**  A clever way of mathematically moving data points from one configuration to another to help find the most effective solutions. Imagine finding the shortest way to move all the groceries from the store to your home.
* **Mathematics:** They’re using advanced math, including statistics and probability theory, to rigorously analyze these AI systems and find clever solutions to complex problems.  This ensures results are as trustworthy as possible.
* **New evaluation methods:**  They are creating better ways to judge if their AI models are successful, making the development process more reliable.


**Section 4: What's Next?**

The future of computer science is bright!  Researchers are working on:

* **Even faster and more efficient AI:**  Making AI systems that work even better and consume less energy. This is key for both environmental and economic reasons.
* **More reliable and trustworthy AI:**  Reducing errors and improving the accuracy of AI systems, especially in situations where mistakes can have serious consequences.
* **More explainable AI:** Building AI systems that clearly explain their decision-making processes, promoting better understanding, debugging, and ensuring fairness.\n* **AI systems that continuously learn:**  Creating AI that can adapt to changing environments and learn new things without forgetting what it already knows.\n* **AI that learns to solve problems efficiently:** Helping AI systems learn how to plan and achieve their goals more quickly and effectively, which is crucial for minimizing computing resources and costs.\n* **AI that is less vulnerable to attacks:**  Protecting AI systems against deliberate attempts to trick or manipulate them.


That's just a glimpse into the exciting world of computer science!  I hope this has sparked your curiosity, and maybe one day, you'll be the one making these amazing advancements!


In [24]:
o_3 = create_overview(summaries[1400:2100], flash)
Markdown(o_3)

## An Exciting Overview of 2024 AI Research for 5th Graders!

Imagine a world where computers can understand and solve problems almost as well as humans!  That's what computer scientists are working on, and the 2024 conference showed some amazing progress.  Let's explore some of the cool things they're doing.

**Section 1: What can AI do?  Amazing Use Cases!**

This year's conference papers showed AI can do tons of things!  Imagine:

* **Super-smart helpers:**  AI can help us understand complicated problems.  It's like having many expert consultants working together to solve problems faster and more accurately.  This helps businesses like those that offer customer service respond to questions efficiently.
* **Robot helpers:** AI can teach robots to do all sorts of things, even really complicated tasks.  They can learn new things without needing to be reprogrammed every time, which is like having a robot that can help you tidy your room, restock shelves, or even deliver packages, all without requiring a special program for each item.
* **Better predictions:** AI can predict things like how many people will like a new product, when a machine will break, or where a storm will hit.  This helps businesses plan more effectively. This helps in various business sectors such as logistics where predicting the need for supplies and when to restock is critical.
* **Keeping secrets safe:**  AI can help keep your private information safe. This helps companies that operate in highly regulated spaces, such as banks, to protect their customer data while still getting value from it.
* **Making better art:** AI can create incredible art from simple words, helping improve advertising, gaming, and other areas requiring compelling visuals.


**Section 2: What problems are these papers solving?**

AI researchers are working on some tough challenges:

* **Teaching computers new things:**  How can we train AI models quickly and easily for new tasks without forgetting what they already know? This is extremely relevant in many business applications where data needs to be continuously updated and where existing AI models need to be retrained from scratch when new products or business models are added.
* **Handling messy data:**  Real-world data is often incomplete, noisy, or has incorrect labels. How can we train AI models to handle this messy data and still make accurate predictions? Many business data sources are messy and this is critical for accurate interpretations and decisions.
* **Making fast and reliable predictions:**  How can we make sure AI models are not only accurate but also give us a good idea of how certain they are in their predictions?  This helps to better manage risk and avoid mistakes.
* **Making AI safe:** How can we prevent AI systems from being manipulated or doing harmful things?  This is a critical business concern for many high-stakes applications.
* **Keeping secrets safe:**  How can we use powerful AI tools without revealing people's private information? This is particularly important for business operations in regulated spaces.
* **Finding the best solution:** Many business tasks involve a search for the best outcome from a vast number of possibilities. How do we perform this search efficiently and effectively?


**Section 3: How are they solving these problems?  Cool methods!**

Researchers are using some clever techniques:

* **Having many experts:**  Some models combine the power of many smaller, specialized AI models, each an "expert" at different things. They learn which expert to consult for each part of a problem.
* **Breaking down big problems:**  Complex tasks are split into smaller, easier-to-solve parts.  This is like assembling a LEGO castle;  you build small pieces to make a larger whole.
* **Using pictures and words together:**  Models are trained to understand both images and text. This is like combining the visual and written information we use to learn and understand things, creating more capable AI systems.
* **Learning from mistakes:**  Models are taught to learn from their errors, not just correct ones. This improves performance and reliability. This approach focuses on learning the "why" behind failures, rather than only focusing on successes.
* **Adding clever noise:**  Adding noise to data during training helps to make models more robust and to protect private information.
* **Using mathematical shortcuts:** Researchers are using advanced math to make AI models faster and more efficient.  This is like finding a shortcut to a destination, making the AI reach its goals faster with less computation.
* **Leveraging existing expertise:**  Instead of starting completely from scratch, many models benefit from leveraging pre-existing powerful AI models. This is similar to utilizing the knowledge of an expert to improve learning efficiency.


**Section 4: What's next?  The Future of AI!**

There's so much more to discover!  Future research will focus on:

* **Making AI even smarter:**  Developing AI models that can learn and adapt even more quickly and efficiently.\n* **Improving AI safety and fairness:**  Ensuring AI systems are reliable, ethical, and unbiased.\n* **Making AI more accessible:**  Developing AI solutions that are affordable and easy to use for businesses of all sizes.\n* **Building more powerful robots:** Combining the advances in AI to make robots more capable and versatile.\n* **Understanding how the brain works:** Learning from the efficiency and complexity of the human brain to develop better and more efficient AI.


AI is a rapidly growing field.  The research presented at this year's conference shows just how much progress is being made.  It\'s an exciting time to be involved, and who knows what amazing things will be possible in the future!


In [26]:
o_4 = create_overview(summaries[2100:], flash)
Markdown(o_4)

Good morning, everyone! Today, we're going on an exciting adventure into the world of computer science!  We'll explore some amazing things that computer scientists have been working on this year at a big conference.  Think of it like a science fair, but with super-smart computers!

**Section 1:  Cool Things Computers Can Do (Use Cases)**

This year's conference papers showed off some incredible things computers can do! Imagine these uses:

* **Understanding Relationships:**  Computers can now understand complicated relationships between many things at once, not just simple pairs. Imagine understanding how all the parts of a car work together to predict when it might need a repair!  This helps companies plan better and fix problems faster.

* **Grouping Similar Things:** Computers can group similar things together, even if some information is missing.  Think about sorting toys – a computer can group similar toys even if some toys are partially hidden or have missing labels! This helps businesses better understand customers, predict what they might buy, and even detect fraud.

* **Recognizing Images and Words:** Computers can understand both images and words together.  Imagine a computer identifying a "bird" in a picture, even if it's a type of bird it hasn't seen before – this helps businesses categorize and search images more effectively.  It also helps improve the safety of self-driving cars.

* **Building Better Images:** Computers can create very realistic images and videos from just text.  Think of being able to instantly create a picture of "a cat wearing a hat" – this makes advertising and product demonstration easier and quicker.  It’s also crucial for improving medical imaging.

* **Making Better Decisions:**  Computers can make better decisions in uncertain situations, like quickly finding the best route for a delivery truck, especially in situations with unexpected delays or changing demands! This helps businesses save time and money.

* **Understanding Cause and Effect:**  Computers are getting better at understanding why things happen, like determining which part of an ad campaign is most effective or identifying which factors are the most important for a customer's satisfaction. This helps businesses make better, more informed decisions.


**Section 2: Problems Computers are Solving**

These amazing computer tools solve real-world problems:

* **Complex Relationships:** Current tools struggle to handle many things interacting simultaneously.  The new tools can analyze things like interactions in a social network or between parts of a machine to help predict the future.

* **Incomplete Data:** Many datasets have missing information. The new tools can make smart guesses, leading to better predictions.

* **Misleading Information:** AI models can sometimes get confused. The new tools help them focus on what's truly important, reducing errors.

* **Poor Image Reconstruction:** It's difficult to get a clear picture from fuzzy images.  The new tools can rebuild clearer images, helping in everything from medical diagnoses to self-driving cars.

* **Slow and Expensive AI:** Running large AI systems is expensive.  The new techniques make these systems faster and more affordable.

* **Unfairness in AI:**  AI can sometimes be biased.  The new tools help identify and correct this bias, leading to fairer decisions.

* **Privacy Violations:**  Using data requires being mindful of privacy. The new techniques help protect privacy when analyzing data.


**Section 3: How Computers Solve These Problems**

Scientists use clever tricks to make computers solve these problems:

* **Smart Math:**  They use advanced mathematical formulas and techniques like Gaussian Processes and Tensor Decomposition to handle uncertainty and reveal hidden patterns in data.

* **Improved Algorithms:**  They develop clever algorithms (like those used in clustering and classification) that are faster and more accurate.  Some new algorithms are designed to address data incompleteness directly.

* **Neural Networks:** They use neural networks, which are like artificial brains, to learn complex patterns from data. The new neural networks are specifically designed for handling complex relationships and even learn how to better account for uncertainty.

* **Diffusion Models:** They use diffusion models to make computers better at generating realistic images, videos, and even sounds.  These models are making many things like realistic advertising and product demonstration much more efficient.

* **Better ways to handle uncertainty:**  A lot of the work shown in the conference focused on not just making predictions, but also telling us how reliable these predictions are!  This is crucial for making good decisions.

* **Using Large Language Models:**  Large language models (LLMs), similar to the technology in ChatGPT, are used to understand complex relationships and generate explanations, making AI more transparent and trustworthy.


**Section 4: What's Next? (Future Research)**

Computer scientists are always working on making computers even better!  Here are some future directions:

* **Even Faster AI:**  Making AI systems work much faster and more efficiently is a big focus.  This includes reducing the time it takes to train and run these systems, and finding more energy-efficient algorithms.

* **Handling Noisy Data:** Making AI models more robust and reliable, even when the data they receive is inaccurate or incomplete.

* **Explainable AI (XAI):**  Making AI systems more transparent and understandable, so that humans can trust and confidently use them.

* **Fair and Ethical AI:**  Ensuring that AI systems are fair, unbiased, and respect privacy, avoiding discrimination.

* **More powerful generative models:**  Improving computer’s ability to produce realistic and high-fidelity images, videos, and other data.

* **Applications to real-world business scenarios:**  The conference showed many great theoretical improvements.  The next steps are to adapt these techniques to specific business problems and get the benefits directly for businesses!

That's just a glimpse of the amazing things happening in computer science!  Who knows, maybe *you* will be the next computer scientist inventing these cool tools!


In [22]:
with open(paper_overview, 'wb') as f:
    pickle.dump(o_1, f)

with open(Path(PROJECT_DIR, 'paper_all_overviews.pkl'), 'wb') as f:
    pickle.dump([o_1, o_2, o_3, o_4], f)

In [47]:
o = flash.generate_content(
    '''
    Create a comprehensive overview for ICML 2024 based on a list of four overviews, each overview is from a subset of all the papers. \
    In the first section, talk about major use cases discussed in conference papers. \
    In the second section, talk about what problems are being solved by the papers. \
    In the third section, talk about how are the problems being solved. What methods did the authors use. \
    In the fourth section, talk about future research directions. \
    Overview should be the similar length as individual overview. \
    Here is the list of four overviews: \
    ''' + '\n\n'.join([o_1, o_2, o_3, o_4])
)

In [48]:
Markdown(o.text)

This is a great overview of the exciting advancements in computer science, particularly in the field of AI! Here's a slightly revised version, aiming for a more concise and impactful presentation, suitable for a broader audience, including a 5th-grader:

**The Amazing World of Computer Science: A Glimpse into the Future**

Imagine computers that can see, understand, and even create things almost as well as humans!  That's what computer scientists are working on, and it's leading to some incredible advancements.

**What Computers Can Do Now:**

* **See the world:**  Computers are getting better at "seeing" images and videos, just like we do.  They can identify objects, faces, and even emotions!  This helps doctors find problems in medical images, self-driving cars see the road, and shops keep track of their products.

* **Understand our words:**  Computers can now understand and respond to our language, like a really smart friend!  This helps create chatbots, smart search engines, and even helps write stories.

* **Make Smart Decisions:**  Computers can help us make better choices by analyzing large amounts of information and learning from past experiences. This can help plan the best route for delivery trucks, suggest the right products for us to buy, or even help diagnose diseases.

* **Create Amazing Things:**  Computers can now create impressive things like realistic images and videos, and even compose music!  This is used in making video games, making marketing more effective, and much, much more.

* **Keep our Secrets Safe:**  Computers can help us protect our important information, such as medical and financial records, using clever methods to keep our data private.

**How do they do it?**

These amazing things are made possible through clever techniques like:

* **Super-powered Brains:**  Computers are learning using "neural networks," which are like networks of brains, to learn from large amounts of data, similar to how we learn from experience.
* **Finding Patterns:**  Computers can now find hidden patterns in information that we might miss. This helps them predict things like weather and understand customer behavior.
* **Working Together:**  Just like a team of people, many AI systems work together to solve problems and accomplish goals far more efficiently than a single AI could.
* **Learning from Mistakes:**  Computers can now learn from their mistakes and improve their performance over time. Imagine learning to ride a bike, only much, much faster!


**What's Next? The Future is Bright!**

Computers are getting better at understanding and doing things that humans do, even better!  Scientists are working on making AI even smarter, more reliable, and safer. We can also expect AI to play a bigger role in many areas, including healthcare, transportation, and entertainment, making our lives easier and more efficient.  It's an exciting time to see what comes next!


In [20]:
flash.count_tokens(str(summaries[0:1000]))

total_tokens: 984626

In [19]:
99031*(2610/100)

2584709.1

In [6]:
from collections import defaultdict

# Extract best papers 
best_papers = {
    'Best Paper': ['Debating with more persuasive LLMs leads',
                   'VideoPoet',
                   'Information Complexity of Stochastic Convex Optimization:',
                   'Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo',
                   'Position: Measure Dataset Diversity',
                   'Genie: Generative Interactive Environments',
                   'Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution',
                   'Scaling Rectified Flow Transformers for High-Resolution Image Synthesis',
                   'Stealing part of a production language model'
                  ]
}

best_paper_ids = defaultdict(list)
for i,p in enumerate(all_papers): 
    for best_key in best_papers:
        for best_title in best_papers[best_key]:
            if p['title'].lower().startswith(best_title.lower()):
                best_paper_ids[best_key].append(i)

#check all best papers are found
for best_key in best_papers:
    assert len(best_papers[best_key]) == len(best_paper_ids[best_key])

with open(best_paper_ids_pkl, 'wb') as f:
    pickle.dump(best_paper_ids, f)

## Generate mp3 for overview and best Papers 

In [8]:
from utils.paper_ontology import create_mp3
paper_mp3.mkdir(parents=False, exist_ok=False)

for i in tqdm(best_paper_ids['Best Paper']):
    create_mp3(summaries[i], Path(paper_mp3, 'paper_'+str(i)+'.mp3'))

In [25]:
with open(paper_overview, 'rb') as f:
    overview = pickle.load(f)

Markdown(overview)
#create_mp3(Markdown(overview), Path(paper_mp3, 'overview.mp3'))

Hello everyone!  Today, we're going to explore some super cool things computer scientists are working on.  It's all about making computers smarter and more helpful!  I'll explain some of the amazing research presented at a recent conference.

**Section 1:  What Computers Can Do Now (Major Use Cases)**

Imagine computers that can design better medicines, predict the weather more accurately, or even help you find exactly what you’re looking for in a huge online store! These are some of the things computer scientists are making possible.  At this conference, researchers showed us how computers are getting better at:

* **Designing new materials:**  Computers can now help predict the shape and properties of tiny materials (nanomaterials), leading to better batteries, medicines, and even stronger buildings.
* **Optimizing business processes:**  Computers are learning to make better decisions all on their own (Reinforcement Learning), helping businesses allocate resources efficiently, improve their supply chains, and personalize marketing campaigns.
* **Understanding images and text:**  Computers can now “understand” both pictures and words, allowing for better image searches, automated descriptions of images, and improved content moderation.
* **Making accurate predictions:**  Computers are getting better at predicting future trends (like sales or stock prices) by analyzing time series data. They’re also making great strides in predicting the properties of molecules, essential for drug discovery.
* **Generating realistic data:**  Computers can create synthetic data—artificial data that mimics real data—including images, videos, text, and sounds. This is helpful for testing, improving existing models, and protecting privacy.
* **Automating complex tasks:**   Computers can use AI to write code, perform various actions on web pages, or even act as virtual assistants that automate many tasks you do every day.
* **Improving health:**   AI is analyzing medical data like EEGs and CT scans to detect illnesses earlier and with greater accuracy, leading to better treatment and improved patient care.
* **Creating 3D models:** Computers are learning to create realistic 3D models of moving objects from videos, which has implications for everything from gaming to robotics to medical imaging.
* **Protecting your data:**  Computer scientists are creating new ways to analyze sensitive data while maintaining strong privacy guarantees using techniques like differential privacy.


**Section 2:  The Problems Computers Are Solving**

These amazing capabilities solve some serious problems:

* **Slow and expensive material development:**  Creating new nanomaterials is currently a slow and expensive process. AI can speed things up and reduce costs.
* **Inefficient decision-making:**  Businesses often struggle to make the best decisions under uncertainty.  AI can help by learning optimal strategies through trial and error.
* **Difficulty in processing large images and text:** Analyzing very large amounts of data is often difficult for computers.  Improved methods allow for efficient processing.
* **Inaccurate predictions:**  Existing forecasting models might be imprecise, leading to poor planning and resource allocation.  AI is improving these predictions.
* **Lack of realistic synthetic data:** Generating realistic simulated environments or data is computationally expensive. Advanced AI is making this more affordable and achievable.
* **Difficulty in automating complex tasks:** Automating things like answering complex questions from data or interacting with web pages is challenging.  New AI models are getting significantly better at doing this.
* **Compromised data privacy:** Many business decisions rely on sensitive customer data.   Improved AI methods enhance privacy protection during model training.
* **Slow and costly model training:** Training large AI models can take a very long time and require lots of computational resources, adding to business costs. This is also a problem when trying to adapt to new tasks, or handle situations where you don't have access to all the model's components ("black-box").  Improved training methods can improve this considerably.
* **Poorly calibrated models (overconfidence):** Highly accurate models can sometimes be overconfident in their predictions, leading to risky decisions.  New methods are improving model reliability.


**Section 3: How Computers Are Solving These Problems**

Computer scientists use many clever techniques:

* **Artificial Intelligence (AI):**  This is the overall field of making computers smarter, encompassing the techniques below.
* **Machine Learning (ML):**  This is like teaching a computer to learn from examples, without being explicitly programmed.
* **Deep Learning:** This is a more advanced type of machine learning, typically involving complex interconnected systems called neural networks.  These neural networks have a similar concept to the structure of the human brain.
* **Reinforcement Learning:** This is like teaching a computer through trial and error using "rewards" for correct answers.
* **Neural Networks:**  These are the mathematical models that act like the "brains" of AI systems.  They are inspired by the network of neurons in the human brain.
* **Graph Neural Networks:**  Specifically for handling data that consists of networks of relationships (like customers and products).
* **Transformers:**  An extremely efficient and powerful type of neural network used for tasks like language processing and image understanding.
* **Diffusion Models:**  A powerful technique for generating images or other data by gradually adding and removing noise from a signal.
* **Optimal Transport:** A mathematical framework for comparing and transforming data distributions.  It is used to understand similarities and differences between datasets.
* **Bayesian Methods:** A way of using probabilistic modeling to quantify and handle uncertainty, making it applicable to situations with limited or incomplete data.  This enables incorporating existing knowledge ("prior" information).
* **Genetic Programming and Evolutionary Algorithms:** These strategies automate the process of improving models, similar to how evolution works.
* **Various Optimization Algorithms:**  Computer scientists use sophisticated math to find the best settings for AI models, including gradient descent, Adam, and other more advanced methods.  These methods are designed to find solutions to complex mathematical problems efficiently.
* **Various statistical techniques:**  Including maximum likelihood estimation, hypothesis testing, and Bayesian inference, depending on the specific problem.
* **Cryptographic methods:** Used to protect sensitive data during analysis using methods like differential privacy and secure multi-party computation.  This ensures the privacy of each individual data contribution.



**Section 4:  The Future of Computer Science**

There's so much more to discover!  Researchers are working on:

* **Making AI more efficient and sustainable:**  Reducing the energy needed to run AI models and reducing their environmental impact.
* **Building more trustworthy and reliable AI:**  Developing methods to ensure AI systems are less prone to errors and biases.
* **Improving AI explainability:**  Making it easier to understand how AI models make their decisions.
* **Developing AI that can learn and adapt continuously:**  Creating AI systems that can learn from new data without "forgetting" what they already know.
* **Expanding AI capabilities:**  Improving the way AI processes images, videos, sound, and other types of data.
* **Creating better tools for automating various tasks:**   This includes developing AI models for tasks like code generation, web automation, and even acting as virtual assistants.
* **Creating better methods for combining various types of data:** This includes methods to combine image and text data for advanced analysis, or combining various time-series data that may be collected at different frequencies (daily, monthly, quarterly).
* **Improving methods for handling large datasets and optimizing AI model performance**:  This involves making models work efficiently with less data and improving the speed and stability of the training process.  It also involves handling "noisy" data (data with errors) and making models less sensitive to these errors.


Computer science is a field full of exciting possibilities!  I hope this overview has sparked your curiosity and maybe even inspired you to learn more.


In [29]:
overview_text = \
'''
Hello everyone! Today, we're going to explore some super cool things computer scientists are working on. It's all about making computers smarter and more helpful! I'll explain some of the amazing research presented at a recent conference.

Section 1: What Computers Can Do Now (Major Use Cases)

Imagine computers that can design better medicines, predict the weather more accurately, or even help you find exactly what you’re looking for in a huge online store! These are some of the things computer scientists are making possible. At this conference, researchers showed us how computers are getting better at:

Designing new materials: Computers can now help predict the shape and properties of tiny materials (nanomaterials), leading to better batteries, medicines, and even stronger buildings.
Optimizing business processes: Computers are learning to make better decisions all on their own (Reinforcement Learning), helping businesses allocate resources efficiently, improve their supply chains, and personalize marketing campaigns.
Understanding images and text: Computers can now “understand” both pictures and words, allowing for better image searches, automated descriptions of images, and improved content moderation.
Making accurate predictions: Computers are getting better at predicting future trends (like sales or stock prices) by analyzing time series data. They’re also making great strides in predicting the properties of molecules, essential for drug discovery.
Generating realistic data: Computers can create synthetic data—artificial data that mimics real data—including images, videos, text, and sounds. This is helpful for testing, improving existing models, and protecting privacy.
Automating complex tasks: Computers can use AI to write code, perform various actions on web pages, or even act as virtual assistants that automate many tasks you do every day.
Improving health: AI is analyzing medical data like EEGs and CT scans to detect illnesses earlier and with greater accuracy, leading to better treatment and improved patient care.
Creating 3D models: Computers are learning to create realistic 3D models of moving objects from videos, which has implications for everything from gaming to robotics to medical imaging.
Protecting your data: Computer scientists are creating new ways to analyze sensitive data while maintaining strong privacy guarantees using techniques like differential privacy.
Section 2: The Problems Computers Are Solving

These amazing capabilities solve some serious problems:

Slow and expensive material development: Creating new nanomaterials is currently a slow and expensive process. AI can speed things up and reduce costs.
Inefficient decision-making: Businesses often struggle to make the best decisions under uncertainty. AI can help by learning optimal strategies through trial and error.
Difficulty in processing large images and text: Analyzing very large amounts of data is often difficult for computers. Improved methods allow for efficient processing.
Inaccurate predictions: Existing forecasting models might be imprecise, leading to poor planning and resource allocation. AI is improving these predictions.
Lack of realistic synthetic data: Generating realistic simulated environments or data is computationally expensive. Advanced AI is making this more affordable and achievable.
Difficulty in automating complex tasks: Automating things like answering complex questions from data or interacting with web pages is challenging. New AI models are getting significantly better at doing this.
Compromised data privacy: Many business decisions rely on sensitive customer data. Improved AI methods enhance privacy protection during model training.
Slow and costly model training: Training large AI models can take a very long time and require lots of computational resources, adding to business costs. This is also a problem when trying to adapt to new tasks, or handle situations where you don't have access to all the model's components ("black-box"). Improved training methods can improve this considerably.
Poorly calibrated models (overconfidence): Highly accurate models can sometimes be overconfident in their predictions, leading to risky decisions. New methods are improving model reliability.
Section 3: How Computers Are Solving These Problems

Computer scientists use many clever techniques:

Artificial Intelligence (AI): This is the overall field of making computers smarter, encompassing the techniques below.
Machine Learning (ML): This is like teaching a computer to learn from examples, without being explicitly programmed.
Deep Learning: This is a more advanced type of machine learning, typically involving complex interconnected systems called neural networks. These neural networks have a similar concept to the structure of the human brain.
Reinforcement Learning: This is like teaching a computer through trial and error using "rewards" for correct answers.
Neural Networks: These are the mathematical models that act like the "brains" of AI systems. They are inspired by the network of neurons in the human brain.
Graph Neural Networks: Specifically for handling data that consists of networks of relationships (like customers and products).
Transformers: An extremely efficient and powerful type of neural network used for tasks like language processing and image understanding.
Diffusion Models: A powerful technique for generating images or other data by gradually adding and removing noise from a signal.
Optimal Transport: A mathematical framework for comparing and transforming data distributions. It is used to understand similarities and differences between datasets.
Bayesian Methods: A way of using probabilistic modeling to quantify and handle uncertainty, making it applicable to situations with limited or incomplete data. This enables incorporating existing knowledge ("prior" information).
Genetic Programming and Evolutionary Algorithms: These strategies automate the process of improving models, similar to how evolution works.
Various Optimization Algorithms: Computer scientists use sophisticated math to find the best settings for AI models, including gradient descent, Adam, and other more advanced methods. These methods are designed to find solutions to complex mathematical problems efficiently.
Various statistical techniques: Including maximum likelihood estimation, hypothesis testing, and Bayesian inference, depending on the specific problem.
Cryptographic methods: Used to protect sensitive data during analysis using methods like differential privacy and secure multi-party computation. This ensures the privacy of each individual data contribution.
Section 4: The Future of Computer Science

There's so much more to discover! Researchers are working on:

Making AI more efficient and sustainable: Reducing the energy needed to run AI models and reducing their environmental impact.
Building more trustworthy and reliable AI: Developing methods to ensure AI systems are less prone to errors and biases.
Improving AI explainability: Making it easier to understand how AI models make their decisions.
Developing AI that can learn and adapt continuously: Creating AI systems that can learn from new data without "forgetting" what they already know.
Expanding AI capabilities: Improving the way AI processes images, videos, sound, and other types of data.
Creating better tools for automating various tasks: This includes developing AI models for tasks like code generation, web automation, and even acting as virtual assistants.
Creating better methods for combining various types of data: This includes methods to combine image and text data for advanced analysis, or combining various time-series data that may be collected at different frequencies (daily, monthly, quarterly).
Improving methods for handling large datasets and optimizing AI model performance: This involves making models work efficiently with less data and improving the speed and stability of the training process. It also involves handling "noisy" data (data with errors) and making models less sensitive to these errors.
Computer science is a field full of exciting possibilities! I hope this overview has sparked your curiosity and maybe even inspired you to learn more.
'''

create_mp3(overview_text, Path(paper_mp3, 'overview.mp3'))


0

## Index in neo4j

In [4]:
from utils.paper_QA import *

docs = paper2doc(all_papers)

In [9]:
[i for i in range(0, 1000, 100)]

[0, 100, 200, 300, 400, 500, 600, 700, 800, 900]

In [32]:
import time

NEO4J_URI='neo4j+s://e0a3a179.databases.neo4j.io'
NEO4J_USERNAME='neo4j'
NEO4J_PASSWORD='<>'

In [None]:
db = EmbeddingDB(NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD, docs[:100])

In [None]:

for i in tqdm(range(700, len(docs), 100)):
    db.insert(docs[i:min(i+100, len(docs))])
    time.sleep(30)

  7%|█████▉                                                                             | 17/240 [13:46<2:25:35, 39.17s/it][#D4D2]  _: <CONNECTION> error: Failed to read from defunct connection IPv4Address(('e0a3a179.databases.neo4j.io', 7687)) (ResolvedIPv4Address(('34.28.184.63', 7687))): OSError('No data')
Transaction failed and will be retried in 0.8328822951575565s (Failed to read from defunct connection IPv4Address(('e0a3a179.databases.neo4j.io', 7687)) (ResolvedIPv4Address(('34.28.184.63', 7687))))
 25%|████████████████████▊                                                              | 60/240 [42:11<2:00:51, 40.29s/it][#D4D1]  _: <CONNECTION> error: Failed to read from defunct connection ResolvedIPv4Address(('34.28.184.63', 7687)) (ResolvedIPv4Address(('34.28.184.63', 7687))): OSError('No data')
Unable to retrieve routing information
Transaction failed and will be retried in 0.8240881554011467s (Unable to retrieve routing information)
 38%|███████████████████████████████       

## Query Neo4j

In [33]:
from neo4j import GraphDatabase

with GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD)) as driver:
    driver.verify_connectivity()
    print("Connection established.")

Connection established.


In [36]:
from langchain_neo4j import Neo4jVector
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import os

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = "<>"

index_name = "vector"  # default index name
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

store = Neo4jVector.from_existing_index(
    embeddings,
    url=NEO4J_URI,
    username=NEO4J_USERNAME,
    password=NEO4J_PASSWORD,
    index_name=index_name,
)

In [59]:
query = 'Why is it difficult to process large images and text?'
docs = store.similarity_search_with_score(query, k=5)

In [62]:
docs[0][0].metadata

{'section_title': '1. Introduction', 'title': ''}

In [50]:
context = ''
for doc, score in docs:
    context += doc.page_content + '\n'

prompt = f'Answer question based on given context. Question: {query} \n Context: {context}'
response = flash.generate_content(prompt)

In [51]:
response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "Processing large images and text is difficult due to memory limitations in modern computer systems.  Current computer vision models are trained on smaller images, and processing large images requires either down-sampling or cropping, both of which lead to a significant loss of high-frequency information and global context crucial for many real-world tasks.  In the case of text, the encoding cost scales quadratically with sequence length, making processing very long texts computationally expensive.\n"
              }
            ],
            "role": "model"
          },
          "finish_reason": "STOP",
          "avg_logprobs": -0.16437047251154868
        }
      ],
      "usage_metadata": {
        "prompt_token_count": 3269,
        "candidates_token_c

## Store files for streamlit

In [None]:
paper_pdf_dir = Path(PROJECT_DIR, 'paper_pdfs')
paper_summaries = Path(PROJECT_DIR, 'paper_summaries')
paper_parsed_dir = Path(PROJECT_DIR, 'paper_parsed.pkl')
paper_summaries_pkl = Path(PROJECT_DIR, 'paper_summaries.pkl') #{'overview':<>, 'best':[{'tag':<>. 'summary':<>}, ], 'all': []}
paper_overview = Path(PROJECT_DIR, 'paper_overview.pkl')
best_paper_ids_pkl = Path(PROJECT_DIR, 'best_paper_ids.pkl')
paper_mp3 = Path(PROJECT_DIR, 'mp3') #overview.mp3, <tag>.mp3

In [57]:
streamlit_data_file = Path(PROJECT_DIR, 'streamlit_data.pkl')

with open(paper_summaries_pkl, 'rb') as f:
    summaries = pickle.load(f)

with open(paper_overview, 'rb') as f:
    overview = pickle.load(f)

with open(best_paper_ids_pkl, 'rb') as f:
    best_paper_ids = pickle.load(f)

titles = [p['title'] for p in all_papers]

with open(streamlit_data_file, 'wb') as f:
    pickle.dump((summaries, overview, best_paper_ids, titles), f)

In [55]:
len(titles)

2610

In [56]:
len(summaries)

2610

In [58]:
best_paper_ids

defaultdict(list,
            {'Best Paper': [81, 220, 530, 1297, 1298, 1476, 1725, 1858, 2033]})