## Q. What are the most common prompts that developers ask ChatGPT? How does ChatGPT answer those prompts - Code or No Code?


In [14]:
# Importing all necessary packages
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
import string
import networkx as nx
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

# Ensure necessary components are downloaded
# nltk.download('punkt')
# nltk.download('stopwords')

In [15]:
# Normalizing the JSON file because of the complexity it has
data = pd.json_normalize(pd.read_json("20231012_235128_issue_sharings.json")["Sources"])
data.columns

Index(['Type', 'URL', 'Author', 'RepoName', 'RepoLanguage', 'Number', 'Title',
       'Body', 'CreatedAt', 'ClosedAt', 'UpdatedAt', 'State',
       'ChatgptSharing'],
      dtype='object')

**Considering the Issue Sharings file to test the code.**

In [16]:
data = pd.read_json("20231012_235128_issue_sharings.json")
data = pd.json_normalize(data['Sources'])
chatgpt_sharing_data = pd.json_normalize(data['ChatgptSharing'].explode())
chatgpt_sharing_data.columns

Index(['URL', 'Status', 'DateOfConversation', 'DateOfAccess', 'Title',
       'NumberOfPrompts', 'TokensOfPrompts', 'TokensOfAnswers', 'Model',
       'Conversations', 'HTMLContent', 'Mention.MentionedURL',
       'Mention.MentionedProperty', 'Mention.MentionedAuthor',
       'Mention.MentionedText'],
      dtype='object')

In [17]:
repo_lang = list(set(data["RepoLanguage"]))
print(*repo_lang)

GLSL PHP PLpgSQL JavaScript Clojure F# Swift C++ MATLAB Objective-C Rust HTML Go Python Verilog C# Groovy Jupyter Notebook PowerShell C Shell TypeScript Nim Makefile None Perl Vue CSS Astro TeX R Zig Ruby Java Emacs Lisp Kotlin Game Maker Language


In [18]:
data = pd.read_json("20231012_235128_issue_sharings.json")
data = pd.json_normalize(data['Sources'])
chatgpt_sharing_data = pd.json_normalize(data['ChatgptSharing'].explode())
first_column_data = chatgpt_sharing_data.iloc[:, 0]
first_column_df = pd.DataFrame(first_column_data)
expanded_chatgpt_sharing = data['ChatgptSharing'].apply(pd.Series).stack().reset_index(drop=True)
all_attributes_chatgpt_sharing = pd.json_normalize(expanded_chatgpt_sharing)
conversations_data = pd.json_normalize(all_attributes_chatgpt_sharing['Conversations'].explode())
conversations_data.columns

Index(['Prompt', 'Answer', 'ListOfCode'], dtype='object')

## I asked ChatGPT what developers normally seek its assistance in.

Developers often approach ChatGPT with a wide range of requests, typically revolving around programming, debugging, learning concepts, and development tools. To identify these questions, you can focus on keywords that are common in such contexts. Here are some common types of requests and corresponding keywords:

1. **Coding Assistance**: 
   - **Keywords**: "code", "write", "script", "implement", "function", "algorithm".
   - **Example**: "Can you write a Python function to sort a list?"

2. **Debugging and Error Resolution**: 
   - **Keywords**: "debug", "error", "fix", "issue", "resolve", "exception", "traceback".
   - **Example**: "Help me debug this JavaScript error message I'm getting."

3. **Explanation of Concepts**:
   - **Keywords**: "explain", "understand", "concept", "how does", "what is", "theory", "principle".
   - **Example**: "Can you explain how recursion works in programming?"

4. **Learning and Educational Resources**:
   - **Keywords**: "learn", "tutorial", "course", "resource", "guide", "documentation", "example".
   - **Example**: "Recommend some resources to learn React."

5. **Best Practices and Design Patterns**:
   - **Keywords**: "best practice", "design pattern", "architecture", "efficient", "optimization", "structure", "model".
   - **Example**: "What are some best practices for REST API design?"

6. **Technology and Tool Recommendations**:
   - **Keywords**: "recommend", "tool", "library", "framework", "technology", "platform", "software".
   - **Example**: "Which libraries do you recommend for data visualization in Python?"

7. **Code Review and Optimization**:
   - **Keywords**: "review", "optimize", "improve", "enhance", "refactor", "performance", "efficiency".
   - **Example**: "Can you review and suggest improvements for my Ruby script?"

8. **Project Guidance and Planning**:
   - **Keywords**: "plan", "approach", "strategy", "project", "structure", "organize", "develop".
   - **Example**: "How should I plan the architecture for my mobile app project?"

9. **Integration and Configuration**:
   - **Keywords**: "integrate", "configure", "setup", "install", "deploy", "connect", "environment".
   - **Example**: "How do I integrate a payment gateway in my web application?"

10. **Version Control and Collaboration**:
    - **Keywords**: "git", "version control", "collaborate", "merge", "branch", "commit", "repository".
    - **Example**: "How do I resolve merge conflicts in Git?"

These keywords can help in identifying the nature of developers' questions. By focusing on these, you can categorize and address the specific needs of developer-related queries effectively.

In [19]:
keywords = ["code", "write", "script", "implement", "function", "algorithm", "debug", "error", 
            "fix", "issue", "resolve", "exception", "traceback", "explain", "understand", "concept", 
            "how does", "what is", "theory", "principle", "learn", "tutorial", "course", "resource", 
            "guide", "documentation", "example", "best practice", "design pattern", "architecture", 
            "efficient", "optimization", "structure", "model", "recommend", "tool", "library", 
            "framework", "technology", "platform", "software", "review", "optimize", "improve", 
            "enhance", "refactor", "performance", "efficiency", "plan", "approach", "strategy", 
            "project", "structure", "organize", "develop",  "integrate", "configure", "setup", 
            "install", "deploy", "connect", "environment", "git", "version control", "collaborate", 
            "merge", "branch", "commit", "repository", "generate", 'GLSL','PHP','PLpgSQL', 'JavaScript', 
            'Clojure', 'F#','Swift','C++','MATLAB', 'Objective-C', 'Rust', 'HTML' 'Go',
 'Python',
 'Verilog',
 'C#',
 'Groovy',
 'Jupyter Notebook',
 'PowerShell',
 'C',
 'Shell',
 'TypeScript',
 'Nim',
 'Makefile',
 'Perl',
 'Vue',
 'CSS',
 'Astro',
 'TeX',
 'R',
 'Zig',
 'Ruby',
 'Java',
 'Emacs Lisp',
 'Kotlin',
 'Game Maker Language']

In [22]:
def read_and_preprocess_text(text):
    # Tokenize the text into sentences
    sentences = nltk.sent_tokenize(text)

    # Tokenize each sentence into words, remove stopwords and punctuation, and convert to lowercase
    stop_words = set(nltk.corpus.stopwords.words('english') + list(string.punctuation))
    tokenized_sentences = [
        [word.lower() for word in nltk.word_tokenize(sentence) if word.isalnum() and word.lower() not in stop_words]
        for sentence in sentences
    ]

    return tokenized_sentences

def calculate_cosine_similarity(tfidf_matrix):
    # Calculate cosine similarity between sentences based on TF-IDF matrix
    similarity_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)
    return similarity_matrix

def generate_summary(text, num_sentences=3):
    # Preprocess the text
    tokenized_sentences = read_and_preprocess_text(text)

    # Convert sentences to TF-IDF matrix
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform([' '.join(sentence) for sentence in tokenized_sentences])

    # Calculate cosine similarity matrix
    similarity_matrix = calculate_cosine_similarity(tfidf_matrix)

    # Use the PageRank algorithm to rank sentences
    graph = nx.from_numpy_array(similarity_matrix)
    scores = nx.pagerank(graph)

    # Sort sentences by their scores and select the top ones
    ranked_sentences = sorted(((scores[i], sentence) for i, sentence in enumerate(tokenized_sentences)), reverse=True)
    summary_sentences = [sentence for score, sentence in ranked_sentences[:num_sentences]]

    # Join the summary sentences to form the final summary
    summary = ' '.join([' '.join(sentence) for sentence in summary_sentences])

    return summary

text = conversations_data["Prompt"][12]

summary = generate_summary(text, num_sentences=4).split()

In [23]:
print(*summary)

write python3 script takes mp3 audio track generate beautiful audio visualizer video


In [24]:
conversations_data["ListOfCode"][12]

[{'ReplaceString': '[CODE_BLOCK_0]',
  'Type': 'bash',
  'Content': 'pip install moviepy numpy scipy matplotlib\n'},
 {'ReplaceString': '[CODE_BLOCK_1]',
  'Type': 'python',
  'Content': 'import moviepy.editor as mp\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom moviepy.video.fx import fadein\n\n# Define your input MP3 audio file and output video file\naudio_file = "input_audio.mp3"\noutput_file = "output_visualizer.mp4"\n\n# Load the audio file\naudio = mp.AudioFileClip(audio_file)\n\n# Set the duration of the video to match the audio\nvideo = mp.VideoClip(lambda t: [0], duration=audio.duration)\n\n# Define a function to generate a frame for each moment in time\ndef make_frame(t):\n    # Get the audio data at the given time\n    audio_data = audio.to_soundarray()[int(t * audio.fps)]\n    \n    # Create a simple audio visualizer using matplotlib\n    plt.figure(figsize=(10, 4))\n    plt.specgram(audio_data, Fs=audio.fps, cmap="viridis")\n    plt.axis(\'off\')\n    \n    # S

In [25]:
conversations_data["Prompt"][12]

'Write me python3 script that takes in mp3 audio track and generate a very beautiful audio visualizer video'

In [26]:
for i in keywords:
    if i in conversations_data["Prompt"][12]:
        print(i)

script
generate


In [27]:
text = conversations_data["Prompt"][12]
generate_summary(text, num_sentences=4).split()
for i in keywords:
    if i in summary:
        print(i)

write
script
generate


In [28]:
conversations_data["ListOfCode"][12]

[{'ReplaceString': '[CODE_BLOCK_0]',
  'Type': 'bash',
  'Content': 'pip install moviepy numpy scipy matplotlib\n'},
 {'ReplaceString': '[CODE_BLOCK_1]',
  'Type': 'python',
  'Content': 'import moviepy.editor as mp\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom moviepy.video.fx import fadein\n\n# Define your input MP3 audio file and output video file\naudio_file = "input_audio.mp3"\noutput_file = "output_visualizer.mp4"\n\n# Load the audio file\naudio = mp.AudioFileClip(audio_file)\n\n# Set the duration of the video to match the audio\nvideo = mp.VideoClip(lambda t: [0], duration=audio.duration)\n\n# Define a function to generate a frame for each moment in time\ndef make_frame(t):\n    # Get the audio data at the given time\n    audio_data = audio.to_soundarray()[int(t * audio.fps)]\n    \n    # Create a simple audio visualizer using matplotlib\n    plt.figure(figsize=(10, 4))\n    plt.specgram(audio_data, Fs=audio.fps, cmap="viridis")\n    plt.axis(\'off\')\n    \n    # S

### Writing a function combining everything together

In [29]:
def read_and_preprocess_text(text):
    # Tokenize the text into sentences
    sentences = nltk.sent_tokenize(text)

    # Tokenize each sentence into words, remove stopwords and punctuation, and convert to lowercase
    stop_words = set(nltk.corpus.stopwords.words('english') + list(string.punctuation))
    tokenized_sentences = [
        [word.lower() for word in nltk.word_tokenize(sentence) if word.isalnum() and word.lower() not in stop_words]
        for sentence in sentences
    ]

    return tokenized_sentences

def calculate_cosine_similarity(tfidf_matrix):
    # Calculate cosine similarity between sentences based on TF-IDF matrix
    similarity_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)
    return similarity_matrix

def generate_summary(text, num_sentences=3):
    # Preprocess the text
    tokenized_sentences = read_and_preprocess_text(text)

    # Convert sentences to TF-IDF matrix
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform([' '.join(sentence) for sentence in tokenized_sentences])

    # Calculate cosine similarity matrix
    similarity_matrix = calculate_cosine_similarity(tfidf_matrix)

    # Use the PageRank algorithm to rank sentences
    graph = nx.from_numpy_array(similarity_matrix)
    scores = nx.pagerank(graph)

    # Sort sentences by their scores and select the top ones
    ranked_sentences = sorted(((scores[i], sentence) for i, sentence in enumerate(tokenized_sentences)), reverse=True)
    summary_sentences = [sentence for score, sentence in ranked_sentences[:num_sentences]]

    # Join the summary sentences to form the final summary
    summary = ' '.join([' '.join(sentence) for sentence in summary_sentences])

    return summary

# Example usage
text = conversations_data["Prompt"][12]

summary = generate_summary(text, num_sentences=4).split()

keywords = ["code", "write", "script", "implement", "function", "algorithm", "debug", "error", 
            "fix", "issue", "resolve", "exception", "traceback", "explain", "understand", "concept", 
            "how does", "what is", "theory", "principle", "learn", "tutorial", "course", "resource", 
            "guide", "documentation", "example", "best practice", "design pattern", "architecture", 
            "efficient", "optimization", "structure", "model", "recommend", "tool", "library", 
            "framework", "technology", "platform", "software", "review", "optimize", "improve", 
            "enhance", "refactor", "performance", "efficiency", "plan", "approach", "strategy", 
            "project", "structure", "organize", "develop",  "integrate", "configure", "setup", 
            "install", "deploy", "connect", "environment", "git", "version control", "collaborate", 
            "merge", "branch", "commit", "repository", "generate"]

keywords_in_summary = []
for i in keywords:
    if i in summary:
        keywords_in_summary.append(i)
print("Keywords found in summary are: ", keywords_in_summary)
print("\n")
print("Did ChatGPT provide code or no-code solution:")
if keywords_in_summary != []:
    if conversations_data["ListOfCode"][12] != []:
        print("ChatGPT provided code for the given prompt.")
    else:
        print("ChatGPT provided no-code solution for the given prompt.")

Keywords found in summary are:  ['write', 'script', 'generate']


Did ChatGPT provide code or no-code solution:
ChatGPT provided code for the given prompt.


### Generalized code for all prompts in the given data

In [None]:
exa