- March 23 - plugins preview https://openai.com/blog/chatgpt-plugins
- gdb video clip https://twitter.com/gdb/status/1638971232443076609
- It's writing & executing MoviePy code (https://zulko.github.io/moviepy/), which uses ffmpeg under the hood. So the time here is basically all waiting for ffmpeg to run (a big step up from the usual bottleneck of looking up the set of arguments to do what I want!).
- Jul 6-8 - GA release https://help.openai.com/en/articles/6825453-chatgpt-release-notes
- Jul 10 - Latent Space Writeup: https://www.latent.space/p/code-interpreter#details
https://chat.openai.com/share/38301a21-bf09-4274-848b-b9cd5523d461
and requirements.txt https://twitter.com/swyx/status/1677589535587467264 (annotated)
- Video manipulation
- using moviePy https://twitter.com/gdb/status/1638971232443076609
- Accepted filetypes
- .zip files: it can unzip files and continue to open up and inspect those files
- "I apologize for the oversight. The variable
unzip_folder
was also lost in the reset. We'll need to extract the data from the zip file again before proceeding"
- "I apologize for the oversight. The variable
- .csv files
- .xls files https://twitter.com/emollick/status/1652170706312896512
- .png files
- lightweight image editing and resizing
- .mov files
- .zip files: it can unzip files and continue to open up and inspect those files
- Can draw https://twitter.com/swyx/status/1677567255377494016?s=20
- but has challenges
- create a graphviz chart - of code https://chat.openai.com/share/adc73608-4c80-497b-ad03-31de28bfa919
- Can map using folium
- https://twitter.com/emollick/status/1677570911124242433?s=20
- render the folium map to an image and show the image
- Data analyst
- Open ended analysis:
- "Here is some data on superhero powers, look through it and tell me what you find"
- "I am interested in doing some predictive modelling, where we can predict what powers a hero might have based on other factors."
- Joins: "Could you first combine the two datasets"
- Integrity checks: "... and confirm that they combined properly"
- Data Cleaning: "does the data need cleaning in any way?"
- Data Preprocessing: "Great! Cluster analysis can help us group similar superheroes together based on their attributes. Before we start, we need to preprocess our data. This will involve the following steps:" https://chat.openai.com/share/770ab170-8fed-402f-850f-e6d7273e77cd
- Summary of analysis: "what is your summary of all this analysis, in bullet points?"
- Create Interactive Dashboard https://emollick.github.io/Superhero/
- Dataviz
- Open ended analysis:
- Can upload more libraries and dependencies
- Behavior
- Giving options, and pausing for confirmation
- Great! Before we proceed, we need to further refine the scope of our predictive task. As you might notice, we have a large number of potential powers (labels) to predict, which could make the problem very complex. Would you like to predict all possible powers, or focus on a specific subset of powers?
- Another important point to note is that is our current dataset is quite large and might be too complex for certain algorithms. We might need to simplify our problem further by reducing the number of features. This could be done through a dimensionality reduction technique or simply selecting a subset of the most relevant features.
- Finally, we need to choose a suitable model for our multi-label classification task. Some possibilities include decision trees, random forests, or neural networks. Each of these options has its own strengths and weaknesses, and the best choice often depends on the specific characteristics of the dataset. Please let me know how you would like to proceed.
- "pick which you think is best, be clear about strengths and weaknesses"
- Outliers can have a large influence on machine learning models, particularly those that use distance measures like Random Forest. Given this, we could:
- Remove these outliers from the data.
- Winsorize the data, which means capping the extreme values at a certain threshold.
- Leave the data as-is if we believe the extreme values are accurate and important to the model.
- Which approach would you like to take?
- Recovery from minor data issues
It seems like there are still some values in the data that are causing issues. In the error message, it's mentioned that the input contains NaN, infinity, or a value too large for dtype('float32').
It appears there are no missing (NaN) or infinite values in the dataset after our preprocessing steps. The error during model training might be due to a different issue.
Looking at the unique values in the 'Height' and 'Weight' columns, we see some extreme values. For example, heights of 975 cm and 876 cm, and weights of 900 kg and 855 kg are unusually high for humans and most known superhero characters. These could be misrecorded data, placeholder values for unknowns, or correct data for exceptionally large characters.
IndexError: index 0 is out of bounds for axis 0 with size 0
-I apologize for the confusion. It seems like there was a misunderstanding in my previous step. I attempted to use the encoded name (a numerical value) to fetch the original name from the preprocessed data, but this approach was not correct because the numerical value does not represent an index in the original data.
- Requesting for more data
- Giving options, and pausing for confirmation
- Environment
- Preemption recovery? "I'm sorry, but it seems there was a disruption in the code execution environment causing a reset."
- "Given the repeated resets in the code execution state, I think the most efficient way forward is to write a self-contained script that does not rely on any variables from earlier in the session."
- Dynamically imports modules
- Catastrophic recovery and retry: "Let's start from the beginning by extracting the data from the zip file again. I'll make sure to correctly specify the locations this time." https://chat.openai.com/share/770ab170-8fed-402f-850f-e6d7273e77cd
- can you display this .png inline?
- from IPython.display import Image # Display the generated flowchart Image(filename='/mnt/data/flowchart_enhanced.png')
- <IPython.core.display.Image object>
- KeyboardInterrupt: after long execution https://chat.openai.com/share/adc73608-4c80-497b-ad03-31de28bfa919
- Image processing with PIL: "make the background of this image black"
- Cv2 library for computer vision, pytesseract for OCR
- Preemption recovery? "I'm sorry, but it seems there was a disruption in the code execution environment causing a reset."
- running subprocess.run (according to Simonw)
- display of tables is purely LLM generated and slow. need to swap to component and dump code output thru the component
- "the code execution state was reset" messages quite often https://chat.openai.com/share/adc73608-4c80-497b-ad03-31de28bfa919
- environment recovery is bad: https://twitter.com/swyx/status/1677571294416756737?s=20
- it frequently hallucinates dependencies which are super common
NameError: name 'df' is not defined
NameError: name 'pd' is not defined
NameError: name 'zipfile' is not defined
NameError: name 'np' is not defined
NameError: name 'combinations' is not defined
I apologize for the oversight. It seems I forgot to import thecombinations
function from theitertools
module`NameError: name 'ImageDraw' is not defined
. I apologize for the error, it appears I forgot to import theImageDraw
module from thePIL
package. L
- no internet access
- To proceed, we would need the descriptions of each package. These can typically be obtained from the package's documentation or from PyPI (the Python Package Index). However, fetching data from the internet is not permitted in the current environment.
- it doesnt "fill in the blanks" from LLM knowledge - instruction to "guess the data you don't have" fails
- thinks it doesnt have folium... but it does
- render the folium map to an image and show the image
- Please note that due to the limitations of my current environment, I'm unable to convert the HTML file to an image directly within this notebook. However, you can do this in a local Python environment using tools like
imgkit
orselenium
.
- Please note that due to the limitations of my current environment, I'm unable to convert the HTML file to an image directly within this notebook. However, you can do this in a local Python environment using tools like
- Unfortunately, without more data and given the constraints of this environment, it's not possible to perform a detailed spatial analysis of the crime data. For a complete analysis, I would recommend using a GIS (Geographic Information System) software or library (like
geopandas
in Python) and obtaining more detailed spatial data. - clustering and naming
- cant call openai api to embed text, doesnt have transformers, and no UMAP
- manual Chat GPT Code Interpret git scraper. Includes libraries, env vars, complete filepath listing. Can track changes. Todo: system package list, home folder content etc,. added some file system extracts like the
ace_client
which is part of the jupyter notebook setup https://github.com/tomviner/scrape-ai/tree/main#chatgpt-code-interpreter - https://github.com/SkalskiP/awesome-chatgpt-code-interpreter-experiments