As of this writing, Ravelry is a database cataloguing the details of <b>903,348</b> knitting, crochet, and other fiber art patterns. It's an amazing resource for crafters looking for inspiration for their next project, not to mention its myriad features from forums, to the creation of pattern queues, to yarn stash curation. This is a wealth of resources for any yarn-enthusiast, and we are very excited to be tapping this resource and trying to draw some conclusions from the data we gather.

---

Before you begin, you will need to create a file called 'account.env' and copy/paste the below code into the file. Be sure to replace the placeholder values for the username and password with your own unique keys provided at https://www.ravelry.com/pro/developer. For this application, we are using the "Basic Auth, read only" level of access.

<code># .env
RAVELRY_USERNAME='Your Username'
RAVELRY_PASSWORD='Your Password'</code>

Once you have a file set up with your credentials, let's get started!

In [20]:
# settings.py
from dotenv import load_dotenv
load_dotenv()

# OR, explicitly providing path to '.env'
from pathlib import Path  # python3 only
env_path = 'account.env'
load_dotenv(dotenv_path=env_path)

True

In [21]:
import os
RAVELRY_USERNAME = os.getenv("RAVELRY_USERNAME")
RAVELRY_PASSWORD = os.getenv("RAVELRY_PASSWORD")

At this point, I actually did a quick print to confirm that all was well with importing my authentication keys for the site. If you need that comfort, too, just remove the commenting from the below two lines. That will let you double check that both variables are carrying the correct values at this point in your personal running of the code.

In [None]:
# print(RAVELRY_USERNAME)
# print(RAVELRY_PASSWORD)

In [22]:
import requests
import json
import pandas as pd

During initial planning, we had been thinking we would have to navigate page by page to gather pattern data. As it happens, the request has a page size parameter! As such, I upped our original plan from 560... to about 1% of Ravelry's total pattern database. We'll be requesting <b>over 9000</b> patterns.

We'll still need to navigate this using page numbers, as doing a full request for 9000 patterns will time out (status 504). Doing 5 calls of 2000 each will net a bit over the 1% without being too cumbersome farther along.

In [73]:
url = "https://api.ravelry.com/patterns/search.json"
response1 = requests.get(url, params={"sort": "popularity", "page_size": 2000, "page": 1}, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
response2 = requests.get(url, params={"sort": "popularity", "page_size": 2000, "page": 2}, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
response3 = requests.get(url, params={"sort": "popularity", "page_size": 2000, "page": 3}, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
response4 = requests.get(url, params={"sort": "popularity", "page_size": 2000, "page": 4}, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
response5 = requests.get(url, params={"sort": "popularity", "page_size": 2000, "page": 5}, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))

In [75]:
print(response1)
print(response2)
print(response3)
print(response4)
print(response5)

<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>


As long as you're seeing that sweet "<Response [200]>" you're golden. That means that your authentication has been successful and you made a connection.

In [76]:
patternData1 = response1.json()
patternData2 = response2.json()
patternData3 = response3.json()
patternData4 = response4.json()
patternData5 = response5.json()

In [77]:
len(patternData1['patterns'])

2000

In [78]:
type(patternData1['patterns'])

list

In [79]:
type(patternData1['patterns'][0])

dict

In [80]:
patternData1['patterns'][0].keys()

dict_keys(['free', 'id', 'name', 'permalink', 'personal_attributes', 'first_photo', 'designer', 'pattern_author', 'pattern_sources'])

In [81]:
patternData1['patterns'][0]

{'free': True,
 'id': 130787,
 'name': "Hermione's Everyday Socks",
 'permalink': 'hermiones-everyday-socks',
 'personal_attributes': None,
 'first_photo': {'id': 7278181,
  'sort_order': 1,
  'x_offset': -8,
  'y_offset': -35,
  'square_url': 'https://images4-g.ravelrycache.com/flickr/3/7/0/3704532404/3704532404_s.jpg',
  'medium_url': 'https://images4-g.ravelrycache.com/flickr/3/7/0/3704532404/3704532404.jpg',
  'thumbnail_url': 'https://images4-g.ravelrycache.com/flickr/3/7/0/3704532404/3704532404_t.jpg',
  'small_url': 'https://images4-g.ravelrycache.com/flickr/3/7/0/3704532404/3704532404_m.jpg',
  'medium2_url': 'https://images4-g.ravelrycache.com/flickr/3/7/0/3704532404/3704532404.jpg',
  'small2_url': 'https://images4-g.ravelrycache.com/flickr/3/7/0/3704532404/3704532404_n.jpg',
  'caption': None,
  'caption_html': None,
  'copyright_holder': None},
 'designer': {'crochet_pattern_count': 1,
  'favorites_count': 2106,
  'id': 14789,
  'knitting_pattern_count': 80,
  'name': 'Eric

In [82]:
patternID = patternData1['patterns'][0]['id']

In [83]:
getTestURL = f"https://api.ravelry.com/patterns/{patternID}.json"
testresponse = requests.get(getTestURL, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))

In [84]:
testresponse

<Response [200]>

In [85]:
testData = testresponse.json()

In [86]:
testData['pattern'].keys()

dict_keys(['comments_count', 'created_at', 'currency', 'difficulty_average', 'difficulty_count', 'downloadable', 'favorites_count', 'free', 'gauge', 'gauge_divisor', 'gauge_pattern', 'generally_available', 'id', 'name', 'pdf_url', 'permalink', 'price', 'projects_count', 'published', 'queued_projects_count', 'rating_average', 'rating_count', 'row_gauge', 'updated_at', 'url', 'yardage', 'yardage_max', 'personal_attributes', 'sizes_available', 'product_id', 'currency_symbol', 'ravelry_download', 'download_location', 'pdf_in_library', 'volumes_in_library', 'gauge_description', 'yarn_weight_description', 'yardage_description', 'pattern_needle_sizes', 'notes_html', 'notes', 'packs', 'printings', 'yarn_weight', 'craft', 'pattern_categories', 'pattern_attributes', 'pattern_author', 'photos', 'pattern_type'])

In [87]:
testData

{'pattern': {'comments_count': 148,
  'created_at': '2009/07/09 11:36:45 -0400',
  'currency': None,
  'difficulty_average': 2.332074416002238,
  'difficulty_count': 7149,
  'downloadable': True,
  'favorites_count': 52672,
  'free': True,
  'gauge': 36.0,
  'gauge_divisor': 4,
  'gauge_pattern': 'stockinette stitch  ',
  'generally_available': '2009/07/01 00:00:00 -0400',
  'id': 130787,
  'name': "Hermione's Everyday Socks",
  'pdf_url': '',
  'permalink': 'hermiones-everyday-socks',
  'price': None,
  'projects_count': 28663,
  'published': '2009/07/01',
  'queued_projects_count': 11574,
  'rating_average': 4.628989547038327,
  'rating_count': 7175,
  'row_gauge': None,
  'updated_at': '2017/01/03 08:29:11 -0500',
  'url': 'http://dreamsinfiber.blogspot.com/2009/07/hermoines-everyday-socks-free-pattern.html#links',
  'yardage': 350,
  'yardage_max': 400,
  'personal_attributes': None,
  'sizes_available': "Women's med",
  'product_id': 29899,
  'currency_symbol': None,
  'ravelry_do

Our first call to the Ravelry API snags us the top 1000 patterns based on popularity. This is great, and if nothing else grabs the pattern ID for each of these patterns. We can use that ID to call the API again and get more detailed information about each pattern, which is the real information we want at this point.

Looking over the keys, these are the ones that may be the most interesting for analysis purposes.
- difficulty_average
- rating_average
- projects_count
- downloadable
- free
- published
- price
- yardage
- yardage_max
- yarn_weight > name
- craft > name
- pattern_type > name

These are a few traits that I feel are consistent enough across Ravelry's pattern offerings to use for various analysis. We could find that the most popular patterns end up having a low difficulty and a low maximum yardage to complete the project, or most commonly use a particular weight of yarn. Or even something as simple as the fact that they are free and available as a download directly from Ravelry, which is simply a matter of convenience. 

Next, let's initialize the list containers and confirm the indexing for each value. We'll follow that up by writing the loops that will snag our data.

In [101]:
id_ls = []
difficulty_average_ls = []
rating_average_ls = []
projects_count_ls = []
downloadable_ls = []
free_ls = []
published_ls = []
price_ls = []
yardage_ls = []
yardage_max_ls = []
yarn_weight_ls = []
craft_ls = []
pattern_type_ls = []

In [102]:
patternset1 = patternData1['patterns']

In [103]:
for pattern in patternset1:
    patternID = pattern['id']
    loopURL = f"https://api.ravelry.com/patterns/{patternID}.json"
    loop_response = requests.get(loopURL, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
    pattern_loop = loop_response.json()
    
    try: difficulty_average = pattern_loop['pattern']['difficulty_average']
    except: difficulty_average = None
        
    try: rating_average = pattern_loop['pattern']['rating_average']
    except: rating_average = None
        
    try: projects_count = pattern_loop['pattern']['projects_count']
    except: projects_count = None
        
    try: downloadable = pattern_loop['pattern']['downloadable']
    except: downloadable = None
        
    try: free = pattern_loop['pattern']['free']
    except: free = None
        
    try: published = pattern_loop['pattern']['published']
    except: published = None
        
    try: price = pattern_loop['pattern']['price']
    except: price = None
        
    try: yardage = pattern_loop['pattern']['yardage']
    except: yardage = None
        
    try: yardage_max = pattern_loop['pattern']['yardage_max']
    except: yardage_max = None
        
    try: yarn_weight = pattern_loop['pattern']['yarn_weight']['name']
    except: yarn_weight = None
        
    try: craft = pattern_loop['pattern']['craft']['name']
    except: craft = None
        
    try: pattern_type = pattern_loop['pattern']['pattern_type']['name']
    except: pattern_type = None
    
    id_ls.append(patternID)
    difficulty_average_ls.append(difficulty_average)
    rating_average_ls.append(rating_average)
    projects_count_ls.append(projects_count)
    downloadable_ls.append(downloadable)
    free_ls.append(free)
    published_ls.append(published)
    price_ls.append(price)
    yardage_ls.append(yardage)
    yardage_max_ls.append(yardage_max)
    yarn_weight_ls.append(yarn_weight)
    craft_ls.append(craft)
    pattern_type_ls.append(pattern_type)

In [105]:
print(len(id_ls))
print(len(difficulty_average_ls))
print(len(rating_average_ls))
print(len(projects_count_ls))
print(len(downloadable_ls))
print(len(free_ls))
print(len(published_ls))
print(len(price_ls))
print(len(yardage_ls))
print(len(yardage_max_ls))
print(len(yarn_weight_ls))
print(len(craft_ls))
print(len(pattern_type_ls))

2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000


In [106]:
patternset2 = patternData2['patterns']
patternset3 = patternData3['patterns']
patternset4 = patternData4['patterns']
patternset5 = patternData5['patterns']

In [107]:
for pattern in patternset2:
    patternID = pattern['id']
    loopURL = f"https://api.ravelry.com/patterns/{patternID}.json"
    loop_response = requests.get(loopURL, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
    pattern_loop = loop_response.json()
    
    try: difficulty_average = pattern_loop['pattern']['difficulty_average']
    except: difficulty_average = None
        
    try: rating_average = pattern_loop['pattern']['rating_average']
    except: rating_average = None
        
    try: projects_count = pattern_loop['pattern']['projects_count']
    except: projects_count = None
        
    try: downloadable = pattern_loop['pattern']['downloadable']
    except: downloadable = None
        
    try: free = pattern_loop['pattern']['free']
    except: free = None
        
    try: published = pattern_loop['pattern']['published']
    except: published = None
        
    try: price = pattern_loop['pattern']['price']
    except: price = None
        
    try: yardage = pattern_loop['pattern']['yardage']
    except: yardage = None
        
    try: yardage_max = pattern_loop['pattern']['yardage_max']
    except: yardage_max = None
        
    try: yarn_weight = pattern_loop['pattern']['yarn_weight']['name']
    except: yarn_weight = None
        
    try: craft = pattern_loop['pattern']['craft']['name']
    except: craft = None
        
    try: pattern_type = pattern_loop['pattern']['pattern_type']['name']
    except: pattern_type = None
    
    id_ls.append(patternID)
    difficulty_average_ls.append(difficulty_average)
    rating_average_ls.append(rating_average)
    projects_count_ls.append(projects_count)
    downloadable_ls.append(downloadable)
    free_ls.append(free)
    published_ls.append(published)
    price_ls.append(price)
    yardage_ls.append(yardage)
    yardage_max_ls.append(yardage_max)
    yarn_weight_ls.append(yarn_weight)
    craft_ls.append(craft)
    pattern_type_ls.append(pattern_type)

In [108]:
for pattern in patternset3:
    patternID = pattern['id']
    loopURL = f"https://api.ravelry.com/patterns/{patternID}.json"
    loop_response = requests.get(loopURL, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
    pattern_loop = loop_response.json()
    
    try: difficulty_average = pattern_loop['pattern']['difficulty_average']
    except: difficulty_average = None
        
    try: rating_average = pattern_loop['pattern']['rating_average']
    except: rating_average = None
        
    try: projects_count = pattern_loop['pattern']['projects_count']
    except: projects_count = None
        
    try: downloadable = pattern_loop['pattern']['downloadable']
    except: downloadable = None
        
    try: free = pattern_loop['pattern']['free']
    except: free = None
        
    try: published = pattern_loop['pattern']['published']
    except: published = None
        
    try: price = pattern_loop['pattern']['price']
    except: price = None
        
    try: yardage = pattern_loop['pattern']['yardage']
    except: yardage = None
        
    try: yardage_max = pattern_loop['pattern']['yardage_max']
    except: yardage_max = None
        
    try: yarn_weight = pattern_loop['pattern']['yarn_weight']['name']
    except: yarn_weight = None
        
    try: craft = pattern_loop['pattern']['craft']['name']
    except: craft = None
        
    try: pattern_type = pattern_loop['pattern']['pattern_type']['name']
    except: pattern_type = None
    
    id_ls.append(patternID)
    difficulty_average_ls.append(difficulty_average)
    rating_average_ls.append(rating_average)
    projects_count_ls.append(projects_count)
    downloadable_ls.append(downloadable)
    free_ls.append(free)
    published_ls.append(published)
    price_ls.append(price)
    yardage_ls.append(yardage)
    yardage_max_ls.append(yardage_max)
    yarn_weight_ls.append(yarn_weight)
    craft_ls.append(craft)
    pattern_type_ls.append(pattern_type)

In [109]:
for pattern in patternset4:
    patternID = pattern['id']
    loopURL = f"https://api.ravelry.com/patterns/{patternID}.json"
    loop_response = requests.get(loopURL, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
    pattern_loop = loop_response.json()
    
    try: difficulty_average = pattern_loop['pattern']['difficulty_average']
    except: difficulty_average = None
        
    try: rating_average = pattern_loop['pattern']['rating_average']
    except: rating_average = None
        
    try: projects_count = pattern_loop['pattern']['projects_count']
    except: projects_count = None
        
    try: downloadable = pattern_loop['pattern']['downloadable']
    except: downloadable = None
        
    try: free = pattern_loop['pattern']['free']
    except: free = None
        
    try: published = pattern_loop['pattern']['published']
    except: published = None
        
    try: price = pattern_loop['pattern']['price']
    except: price = None
        
    try: yardage = pattern_loop['pattern']['yardage']
    except: yardage = None
        
    try: yardage_max = pattern_loop['pattern']['yardage_max']
    except: yardage_max = None
        
    try: yarn_weight = pattern_loop['pattern']['yarn_weight']['name']
    except: yarn_weight = None
        
    try: craft = pattern_loop['pattern']['craft']['name']
    except: craft = None
        
    try: pattern_type = pattern_loop['pattern']['pattern_type']['name']
    except: pattern_type = None
    
    id_ls.append(patternID)
    difficulty_average_ls.append(difficulty_average)
    rating_average_ls.append(rating_average)
    projects_count_ls.append(projects_count)
    downloadable_ls.append(downloadable)
    free_ls.append(free)
    published_ls.append(published)
    price_ls.append(price)
    yardage_ls.append(yardage)
    yardage_max_ls.append(yardage_max)
    yarn_weight_ls.append(yarn_weight)
    craft_ls.append(craft)
    pattern_type_ls.append(pattern_type)

In [110]:
for pattern in patternset5:
    patternID = pattern['id']
    loopURL = f"https://api.ravelry.com/patterns/{patternID}.json"
    loop_response = requests.get(loopURL, auth=(RAVELRY_USERNAME, RAVELRY_PASSWORD))
    pattern_loop = loop_response.json()
    
    try: difficulty_average = pattern_loop['pattern']['difficulty_average']
    except: difficulty_average = None
        
    try: rating_average = pattern_loop['pattern']['rating_average']
    except: rating_average = None
        
    try: projects_count = pattern_loop['pattern']['projects_count']
    except: projects_count = None
        
    try: downloadable = pattern_loop['pattern']['downloadable']
    except: downloadable = None
        
    try: free = pattern_loop['pattern']['free']
    except: free = None
        
    try: published = pattern_loop['pattern']['published']
    except: published = None
        
    try: price = pattern_loop['pattern']['price']
    except: price = None
        
    try: yardage = pattern_loop['pattern']['yardage']
    except: yardage = None
        
    try: yardage_max = pattern_loop['pattern']['yardage_max']
    except: yardage_max = None
        
    try: yarn_weight = pattern_loop['pattern']['yarn_weight']['name']
    except: yarn_weight = None
        
    try: craft = pattern_loop['pattern']['craft']['name']
    except: craft = None
        
    try: pattern_type = pattern_loop['pattern']['pattern_type']['name']
    except: pattern_type = None
    
    id_ls.append(patternID)
    difficulty_average_ls.append(difficulty_average)
    rating_average_ls.append(rating_average)
    projects_count_ls.append(projects_count)
    downloadable_ls.append(downloadable)
    free_ls.append(free)
    published_ls.append(published)
    price_ls.append(price)
    yardage_ls.append(yardage)
    yardage_max_ls.append(yardage_max)
    yarn_weight_ls.append(yarn_weight)
    craft_ls.append(craft)
    pattern_type_ls.append(pattern_type)

In [111]:
print(len(id_ls))
print(len(difficulty_average_ls))
print(len(rating_average_ls))
print(len(projects_count_ls))
print(len(downloadable_ls))
print(len(free_ls))
print(len(published_ls))
print(len(price_ls))
print(len(yardage_ls))
print(len(yardage_max_ls))
print(len(yarn_weight_ls))
print(len(craft_ls))
print(len(pattern_type_ls))

10000
10000
10000
10000
10000
10000
10000
10000
10000
10000
10000
10000
10000


In [115]:
TopPatternList = list(zip(id_ls, difficulty_average_ls, rating_average_ls, projects_count_ls,
                         downloadable_ls, free_ls, published_ls, price_ls, yardage_ls,
                         yardage_max_ls, yarn_weight_ls, craft_ls, pattern_type_ls))

colnames = ['Pattern ID', 'Average Difficulty', 'Average Rating', 'Total Projects', 'Downloadable', 'Free',
           'Publish Date','Price', 'Min Yardage', 'Max Yardage', 'Yarn Weight', 'Craft', 'Category']

In [116]:
import pandas as pd

df = pd.DataFrame(TopPatternList, columns=colnames)

df.head(10)

Unnamed: 0,Pattern ID,Average Difficulty,Average Rating,Total Projects,Downloadable,Free,Publish Date,Price,Min Yardage,Max Yardage,Yarn Weight,Craft,Category
0,130787,2.332074,4.62899,28663,True,True,2009/07/01,,350.0,400.0,Fingering,Knitting,Socks
1,418518,2.534649,4.804386,20206,True,False,2013/07/01,1.0,,,Fingering,Knitting,Other
2,426231,1.639235,4.711647,20770,True,True,2013/07/01,,70.0,170.0,Worsted,Knitting,Child
3,443533,2.297305,4.726276,14761,True,True,2013/10/01,,280.0,1800.0,Aran,Knitting,Child
4,124400,1.506778,4.639974,21161,True,True,2009/05/01,,155.0,415.0,Fingering,Knitting,Child
5,709323,1.886782,4.795863,8181,True,True,2016/12/01,,70.0,300.0,Any gauge,Knitting,Mittens/Gloves
6,788421,2.909037,4.75043,6244,True,False,2017/11/01,8.0,915.0,1919.0,Worsted,Knitting,Pullover
7,580119,2.093617,4.781236,10400,True,True,2015/05/01,,200.0,350.0,Fingering,Knitting,Socks
8,315418,2.155676,4.647897,13695,True,True,,,350.0,400.0,Fingering,Knitting,Socks
9,588220,2.288232,4.606033,11319,True,True,2015/06/01,,380.0,420.0,Light Fingering,Knitting,Shawl/Wrap


In [124]:
df['Price'].fillna(0, inplace=True)
df['Min Yardage'].fillna(0, inplace=True)
df['Max Yardage'].fillna(0, inplace=True)
df.head()

Unnamed: 0,Pattern ID,Average Difficulty,Average Rating,Total Projects,Downloadable,Free,Publish Date,Price,Min Yardage,Max Yardage,Yarn Weight,Craft,Category
0,130787,2.332074,4.62899,28663,True,True,2009/07/01,0.0,350.0,400.0,Fingering,Knitting,Socks
1,418518,2.534649,4.804386,20206,True,False,2013/07/01,1.0,0.0,0.0,Fingering,Knitting,Other
2,426231,1.639235,4.711647,20770,True,True,2013/07/01,0.0,70.0,170.0,Worsted,Knitting,Child
3,443533,2.297305,4.726276,14761,True,True,2013/10/01,0.0,280.0,1800.0,Aran,Knitting,Child
4,124400,1.506778,4.639974,21161,True,True,2009/05/01,0.0,155.0,415.0,Fingering,Knitting,Child


In [126]:
df.isnull().sum()

Pattern ID              0
Average Difficulty      0
Average Rating          0
Total Projects          0
Downloadable            0
Free                    0
Publish Date          754
Price                   0
Min Yardage             0
Max Yardage             0
Yarn Weight           176
Craft                   0
Category                0
dtype: int64

In [127]:
df['Yarn Weight'].fillna('None Specified', inplace=True)

In [128]:
df.isnull().sum()

Pattern ID              0
Average Difficulty      0
Average Rating          0
Total Projects          0
Downloadable            0
Free                    0
Publish Date          754
Price                   0
Min Yardage             0
Max Yardage             0
Yarn Weight             0
Craft                   0
Category                0
dtype: int64

In [129]:
df.to_csv('TopRavelryPatternList.csv', index=False, header=True)

I've opted to accept the null values for Publish Date, as these won't make or break any algorithms moving forward; nulls can simply be ignored as-needed to preserve the other entries should we need to do analysis based on date.

But there we have it. All 10000 lines have been sent to a .csv for easy use later in Tableau for presentation visuals, or to import for further analysis.