# SDK Examples - Querying Data Models

Some of the most common query operators in the Sight Machine SDK. We'll use the demo environment (https://demo.sightmachine.io).

In [1]:
from smsdk import client
from datetime import datetime, timedelta
import pandas as pd

## Initialize the SDK client and get a list of all machine types

In [2]:
api_key = ''
api_secret = ''
cli = client.Client('demo')
cli.login('apikey', 
          key_id = api_key, 
          secret_id = api_secret)

types = cli.get_machine_type_names()
types

['Body Maker',
 'Cupping Press',
 'Tester',
 'Necker',
 'z Coolant',
 'Line Efficiency',
 'Internal Lacquer',
 'Washer',
 'Decorator',
 'z Air Compressor',
 'Internal Bake Oven',
 'Final Line',
 'Effluent']

# Working with Cycles

Cycles are the core data set in the SM Platform.  Cycles represent a unit of work on a machine and will contain a variety of data from sensors, quality managent systems, ERP, MES, etc.  

Each cycle is associate with a Machine and a range of time.  Each Machine has a machine type which determines the data schema.  So to query for cycle data, the first step is to lookup the machine type and then to lookup the specific machine(s) of that type.

In [3]:
# list machines of a specific type
machine_type = types[0]
machines = cli.get_machine_names(source_type=machine_type)
machines

['F2 BM 5',
 'F3 BM 7',
 'F2 BM 6',
 'F3 BM 6',
 'F3 BM 3',
 'F2 BM 8',
 'F3 BM 4',
 'BM 1',
 'BM 3',
 'F2 BM 3',
 'BM 8',
 'F3 BM 1',
 'F2 BM 4',
 'BM 6',
 'BM 4',
 'F3 BM 2',
 'F2 BM 1',
 'F3 BM 5',
 'F3 BM 8',
 'BM 2',
 'BM 5',
 'BM 7',
 'F2 BM 2',
 'F2 BM 7']

In [4]:
# retrieve the schema for a particular machine (more on this at end of notebook)
# extract only a list of tag display names
columns = cli.get_machine_schema(machines[0])['display'].to_list()

# going to skip the first 8 fields since those are our internal / common fields
columns = columns[8:]
columns

['0_BM 008: Cans Out',
 '0_BM: CPM',
 'BM 001: Cans Out',
 'BM 002: Cans Out',
 'BM 003: Cans Out',
 'BM 004: Cans Out',
 'BM 005: Cans Out',
 'BM 006: Cans Out',
 'BM 007: Cans Out',
 'BM : Machine status',
 'BM: Air Temp.',
 'BM: Apparent Temp.',
 'BM: Axial Load',
 'BM: CUP Cans Out Total',
 'BM: Can Weight',
 'BM: Dew Point',
 'BM: Dome Depth',
 'BM: MSL Pressure',
 'BM: Machine Status',
 'BM: Machine Status Reasons',
 'BM: Primary Tear Off',
 'BM: Relative Humidity',
 'BM: Thickwall Avg',
 'BM: Thickwall Variation',
 'BM: Thinwall Avg',
 'BM: Trim Height Avg',
 'BM: Trim Height Max',
 'BM: Trim Height Min',
 'BM: Waiting Upstream Timeseries - Raw',
 'BM: Wind Direction',
 'BM: Wind Speed',
 'Machine Status',
 'availability',
 'denominator',
 'edge_arrival_timestamp',
 'is_down',
 'performance',
 'quality',
 'timestamp']

### Selecting a Particular Development Pipeline schema

You can select a development pipeline schema using following code example. This works very similarly to the 'in-use' feature in MA - we can select an alternate pipeline to treat as the production one. Similarly to MA, the setting will persist until you change it back or create a new client.

*Note: By default, the production pipeline schema will be used (just like in MA).*

In [None]:
db_schema = 'pipeline_id' 
cli.select_db_schema(schema_name=db_schema)

## A basic starting query.

Once you have a machine type and machine, you can start to query for cycle data.  We'll use variations on this theme to demonstrate different query options and their effects.

Note that this baseline query already demonstrates:
- Basic filter rules formatted as key value pairs
- Filtering for greater than or less than values
    - It uses `__gte` for greater than or equal.  Use `__gt` for greater than.  Similarly `__lte` is less than or equal vs. `__lt` for less than.
- Sorting returned results
    - Note the `-` prefix before `Endtime` means to sort descending.  To sort ascending, do not place a prefix in front of the variable name.

In [5]:
query = {'Machine': machines[0],
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2), 
         '_order_by': '-End Time'}
df = cli.get_cycles(**query)

print(f'Size of returned data: {df.shape}')
df.head()

_limit not specified.  Maximum of 5000 rows will be returned.
_only not specified.  Selecting first 50 fields.
Size of returned data: (1441, 47)


Unnamed: 0_level_0,Machine,Start Time,End Time,Production Day,Cycle Time (Net),Cycle Time (Gross),Shift,Output,0_BM 008: Cans Out,0_BM: CPM,...,BM: Wind Direction,BM: Wind Speed,Machine Status,availability,denominator,edge_arrival_timestamp,is_down,performance,quality,timestamp
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
nE1r2Z5jsUnbhC6LM6FnmZ3J/Cvk1tIzftSEaGVK0gQ=,F2 BM 5,2023-04-01 23:59:00,2023-04-02 00:00:00,2023-04-01,60000.0,60000.0,Night,1.0,83780.0,244.0,...,,,,88.0,100.0,2023-04-01 23:59:57.010326,0.0,89.0,91.0,2023-04-01 23:59:00.000000
9I7GM6+PrgtQWmu+rVEMbFB1ZY9PJxqqmBCPq6JBDHc=,F2 BM 5,2023-04-01 23:58:00,2023-04-01 23:59:00,2023-04-01,60000.0,60000.0,Night,1.0,83520.0,255.0,...,,,,95.0,100.0,2023-04-01 23:58:58.557272,0.0,96.0,98.0,2023-04-01 23:58:00.000000
sf+ndwHHZwrXSWGn1JT8g5hcKWmmGF7aCzZD9yuFmHM=,F2 BM 5,2023-04-01 23:57:00,2023-04-01 23:58:00,2023-04-01,60000.0,60000.0,Night,1.0,83300.0,240.0,...,,,,90.0,100.0,2023-04-01 23:58:01.691775,0.0,91.0,93.0,2023-04-01 23:57:00.000000
tLf0CWkXHSd+JlwMncP3tv713IEv7sYGqxDcdCwx8f0=,F2 BM 5,2023-04-01 23:56:00,2023-04-01 23:57:00,2023-04-01,60000.0,60000.0,Night,1.0,83060.0,240.0,...,,,,91.0,100.0,2023-04-01 23:57:02.272964,0.0,92.0,94.0,2023-04-01 23:56:00.000000
/+advT4Fxu2ZFgPiegPC6TvwxDqXJjSThpcQveo4Ngo=,F2 BM 5,2023-04-01 23:55:00,2023-04-01 23:56:00,2023-04-01,60000.0,60000.0,Night,1.0,82800.0,240.0,...,,,,93.0,100.0,2023-04-01 23:56:00.845190,0.0,94.0,96.0,2023-04-01 23:55:00.000000


# Selecting columns and silencing the `_only` Warning

To select a specific set of columns, provide a list of column names as a value for the key _only.  For example, `'_only': ['column1', 'column2', 'column3']`

If you do not use _only, the SDK will automatically select the first 50 stats in the machine's configuration, plus common metadata fields for the query.  

Note, you can also pass `'_only': '*'`, which will return everything, including a large number of internal fields.  Since this includes may fields you probably will not need, expect the resulting queries to be quite slow.

**IMPORTANT** If a selected column is all null, it will not be included in the returned data frame.  If you are getting fewer columns returned than expected, this mostly likely means that there was only null data for that column.



In [6]:
# Get the first 10 columns, plus Machine and End Time
select_columns = ['Machine', 'End Time'] + columns[:5]

query = {'Machine': machines[0],
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2),  
         '_order_by': '-End Time',
         '_only': select_columns}
df = cli.get_cycles(**query)

print(f'Size of returned data: {df.shape}')
df.head()

_limit not specified.  Maximum of 5000 rows will be returned.
Size of returned data: (1441, 7)


Unnamed: 0_level_0,Machine,End Time,0_BM 008: Cans Out,0_BM: CPM,BM 001: Cans Out,BM 002: Cans Out,BM 003: Cans Out
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
nE1r2Z5jsUnbhC6LM6FnmZ3J/Cvk1tIzftSEaGVK0gQ=,F2 BM 5,2023-04-02 00:00:00,83780.0,244.0,106580.0,93400.0,90400.0
9I7GM6+PrgtQWmu+rVEMbFB1ZY9PJxqqmBCPq6JBDHc=,F2 BM 5,2023-04-01 23:59:00,83520.0,255.0,106320.0,93160.0,90180.0
sf+ndwHHZwrXSWGn1JT8g5hcKWmmGF7aCzZD9yuFmHM=,F2 BM 5,2023-04-01 23:58:00,83300.0,240.0,106060.0,92920.0,89940.0
tLf0CWkXHSd+JlwMncP3tv713IEv7sYGqxDcdCwx8f0=,F2 BM 5,2023-04-01 23:57:00,83060.0,240.0,105820.0,92680.0,89700.0
/+advT4Fxu2ZFgPiegPC6TvwxDqXJjSThpcQveo4Ngo=,F2 BM 5,2023-04-01 23:56:00,82800.0,240.0,105580.0,92420.0,89460.0


## Restricting the number of rows returned with `_limit` and `_offset`

To restrict the number of rows, use the _limit query option.  For example, `'_limit': 500`.  This will then return at most 500 rows.  

To skip over a specified number of rows, use the _offset query option.  For example `'_offset': 50`.

It is fairly common to use a combination of _limit and _offset togheter for applications such as paginating data.  For example, if a query would normally return 100 rows and you want to break it into two queries you could return the first 50 rows with `'_offset': 0, '_limit': 50` and then return the second 50 rows with `'_offset': 50, '_limit': 50`.

In [7]:
query = {'Machine': machines[0],
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2),  
         '_order_by': '-End Time',
         '_offset': 10,
         '_limit': 500}
df = cli.get_cycles(**query)

print(f'Size of returned data: {df.shape}')

# Notice in the returned data set that the first row is at 23:49 - 23:50 instead of midnight, becuase of the offset
df.head()

_only not specified.  Selecting first 50 fields.
Size of returned data: (500, 47)


Unnamed: 0_level_0,Machine,Start Time,End Time,Production Day,Cycle Time (Net),Cycle Time (Gross),Shift,Output,0_BM 008: Cans Out,0_BM: CPM,...,BM: Wind Direction,BM: Wind Speed,Machine Status,availability,denominator,edge_arrival_timestamp,is_down,performance,quality,timestamp
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
kCCmPJZY9ICSclaQoFvkEtEb/oNE5XSHoJ2jACRnwLo=,F2 BM 5,2023-04-01 23:49:00,2023-04-01 23:50:00,2023-04-01,60000.0,60000.0,Night,1.0,81360.0,240.0,...,,,,95.0,100.0,2023-04-01 23:49:54.047225,0.0,96.0,98.0,2023-04-01 23:49:00.000000
ZWu7b1IGSmtLJCVdeHo8DhxAMOc2MFqU4/fKnfFDjLQ=,F2 BM 5,2023-04-01 23:48:00,2023-04-01 23:49:00,2023-04-01,60000.0,60000.0,Night,1.0,81100.0,240.0,...,,,,90.0,100.0,2023-04-01 23:48:55.931628,0.0,91.0,93.0,2023-04-01 23:48:00.000000
aWyVvLof0uxaTOO9COxYaNMdYsuHUaxRcnra/xNwGRA=,F2 BM 5,2023-04-01 23:47:00,2023-04-01 23:48:00,2023-04-01,60000.0,60000.0,Night,1.0,80840.0,245.0,...,,,,95.0,100.0,2023-04-01 23:47:56.427028,0.0,96.0,98.0,2023-04-01 23:47:00.000000
N3e0qZ1FZIEjs41Hd1S/hj+GdEvQ8/bdfVKHIThYt0w=,F2 BM 5,2023-04-01 23:46:00,2023-04-01 23:47:00,2023-04-01,60000.0,60000.0,Night,1.0,80600.0,252.0,...,SW,20.0,,97.0,100.0,2023-04-01 23:46:58.049310,0.0,98.0,100.0,2023-04-01 23:46:00.000000
31SyAfOmaLfZsxM+SfHXSVWoRj/tOckv0Ad/0lJ4AFQ=,F2 BM 5,2023-04-01 23:45:00,2023-04-01 23:46:00,2023-04-01,60000.0,60000.0,Night,1.0,80340.0,240.0,...,,,,92.0,100.0,2023-04-01 23:45:51.433888,0.0,93.0,95.0,2023-04-01 23:45:00.000000


# Data from more than one Machine or filtering by a list of values using `__in`

Filters can specify a list of acceptable values.  This is most commonly used when selecting data from more than one machine, though it can be used on any field name.  This is done by appending `__in` (*note two underscores*) to the column name and then specifying the list of options.  For example:

    'Machine__in': ['Oven1', 'Oven2']

or

    'Status__in': ['Idle', 'Maintenance', 'Down']

**Important** Selecting multiple machines of different types can result in spare and confusing data frames.  It is strongly recommended to only pick multiple machines of the same type.

You can also query for values that are not in a list by using `__nin` with the same format as `__in`.  For example:

    'Product_Code__nin': ['SuperMax 5000', 'MegaValue 6000']

In [8]:
# Note: taking the first three machines' data, so will result in three times as many records returned
query = {'Machine__in': machines[0:3],
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2),  
         '_order_by': '-End Time'}
df = cli.get_cycles(**query)

print(f'Size of returned data: {df.shape}')
# Notice the Machine column now has three different values
df.head()


_limit not specified.  Maximum of 5000 rows will be returned.
_only not specified.  Selecting first 50 fields.
Size of returned data: (4323, 47)


Unnamed: 0_level_0,Machine,Start Time,End Time,Production Day,Cycle Time (Net),Cycle Time (Gross),Shift,Output,0_BM 008: Cans Out,0_BM: CPM,...,BM: Wind Direction,BM: Wind Speed,Machine Status,availability,denominator,edge_arrival_timestamp,is_down,performance,quality,timestamp
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ZdUUh/8QfnCDmtYZEYkwb5R/VrL3jxGTNxpFU+1H+kw=,F2 BM 6,2023-04-01 23:59:00,2023-04-02 00:00:00,2023-04-01,60000.0,60000.0,Night,1.0,83780.0,240.0,...,,,,97.0,100.0,2023-04-01 23:59:53.929465,0.0,98.0,100.0,2023-04-01 23:59:00.000000
oXzRuM1BVcVuIrUZVw4FkunDEqsfzpMAH9Xg9eVVq+g=,F3 BM 7,2023-04-01 23:59:00,2023-04-02 00:00:00,2023-04-01,60000.0,60000.0,Day,1.0,10300.0,0.0,...,,,,92.0,100.0,2023-04-01 23:59:53.570966,,91.0,90.0,2023-04-01 23:59:00.000000
nE1r2Z5jsUnbhC6LM6FnmZ3J/Cvk1tIzftSEaGVK0gQ=,F2 BM 5,2023-04-01 23:59:00,2023-04-02 00:00:00,2023-04-01,60000.0,60000.0,Night,1.0,83780.0,244.0,...,,,,88.0,100.0,2023-04-01 23:59:57.010326,0.0,89.0,91.0,2023-04-01 23:59:00.000000
KzCzS+3Ihss0sAh73ZCaP3Lsdr3yusDhFuBfwjXZMzk=,F2 BM 6,2023-04-01 23:58:00,2023-04-01 23:59:00,2023-04-01,60000.0,60000.0,Night,1.0,83520.0,240.0,...,,,,98.0,100.0,2023-04-01 23:58:58.629818,0.0,99.0,101.0,2023-04-01 23:58:00.000000
Nc+8VDIzdQR1YvnKilNFBNFx8//8Kxb2xS1nFDI+thA=,F3 BM 7,2023-04-01 23:58:00,2023-04-01 23:59:00,2023-04-01,60000.0,60000.0,Day,1.0,10300.0,0.0,...,,,,93.0,100.0,2023-04-01 23:59:00.421925,,92.0,91.0,2023-04-01 23:58:00.000000


# Filtering to only rows where a specified field exists with `__exists`

Some data fields, such as inspection data, are often quite sparse.  To filter to only rows with or without non-null values, use `__exists`.  `__exists` should be appended to the name of the field, and then give it a boolean for if you want the field to exist (True) or not exist (False).  For example:

    'Inspection_Value__exists': True

or

    'Failure_Code__exists': False

In [9]:
query = { 'Machine': machines[0],
         'BM: Trim Height Avg__exists': True,
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2), 
         '_order_by': '-End Time',
         '_only': ['Machine', 'End Time', 'BM: Trim Height Avg']}
df = cli.get_cycles(**query)

print(f'Size of returned data: {df.shape}')
# Query now only has the small subset of machines with Trim Hight Avg
df.head()

_limit not specified.  Maximum of 5000 rows will be returned.
Size of returned data: (6, 3)


Unnamed: 0_level_0,Machine,End Time,BM: Trim Height Avg
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
K3rxqJBJOosbY1eT0omrRyFsguQUKWdEKe0q9xxdaZI=,F2 BM 5,2023-04-01 22:10:00,131.218831
PypgCWg/NtPvieaQigZpb6W+z7scMXC7SVx4l3C50BQ=,F2 BM 5,2023-04-01 17:40:00,131.212256
/lbBUeOaQTFLtp2iwP7lVLYQvg5+QYcM4Q5QLBE6yig=,F2 BM 5,2023-04-01 13:43:00,131.207373
7YmU47HjwmmBbFq8LeUQTi/Tot+eNgqXPSNlTit9+1o=,F2 BM 5,2023-04-01 10:44:00,131.199
aeE52/nE8/O4oQcLC8RInkskhLpbdJpWRJ5sYBF9kug=,F2 BM 5,2023-04-01 05:37:00,131.227829


# Testing for inequality with `__ne`

The standard `key: value` format assumes it is testing when the key equals the value.  To change this to inequality, add a `__ne` suffix.  For example, `'StatusCode__ne': 0`



In [10]:
query = {'Machine': machines[0],
         '0_BM: CPM__ne': 0,
         'End Time__gte' : datetime(2023, 4, 3), 
         'End Time__lte' : datetime(2023, 4, 6), 
         '_order_by': '-End Time',
         '_only': ['Machine', 'End Time', '0_BM: CPM']}
df = cli.get_cycles(**query)

print(f'Size of returned data: {df.shape}')
df.head()

_limit not specified.  Maximum of 5000 rows will be returned.
Size of returned data: (3465, 3)


Unnamed: 0_level_0,Machine,End Time,0_BM: CPM
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
fPIhdvsO6NMrv6khHAaXJVZZ18ZcBtbRBcB8xM+kCGM=,F2 BM 5,2023-04-06 00:00:00,244.0
U/WcPGOozipkwMGzKPbGdjsYsfSRx1VGajgt1A9/5aA=,F2 BM 5,2023-04-05 23:59:00,255.0
pXRDhnsfbM17VLNuX+4SonNhQMCuQppbDpj6IRGXktE=,F2 BM 5,2023-04-05 23:58:00,240.0
JKbvlaDx/FxHJ9kIbtUASjCegFyA02s47czeiiiQpYQ=,F2 BM 5,2023-04-05 23:57:00,240.0
0dSSh/F27R/h2MiuYkADQ+6QkFSQ3mZfitUl8Twmgus=,F2 BM 5,2023-04-05 23:56:00,240.0


# Working with Downtimes
Similarly to Cycles, the Downtime data model can be queried for a given machine. Everything from the above section still applies, but the main function is get_downtimes() as opposed to get_cycles.

In [11]:
query = {'Machine': machines[0],
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2), 
         '_order_by': '-End Time'}
df = cli.get_downtimes(**query)

print(f'Size of returned data: {df.shape}')
df.head()

Size of returned data: (111, 8)


Unnamed: 0_level_0,Machine,Start Time,End Time,Duration,Shift,Downtime Reason,Downtime Category,Downtime Type
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
VB2pGLD5vcJaE44kjoFrcuCV2gAiLmoAMt1xWKW8+p8=,F2 BM 5,2023-04-01 23:06:21.987,2023-04-01 23:06:51.987,30000.0,Night,Waiting Downstream,Unspecified Category,unplanned
a3FB50wUoACK7QmdSXOGIAvVKdbQKK/XbB47eG503rQ=,F2 BM 5,2023-04-01 23:00:00.000,2023-04-01 23:00:01.897,1897.0,Night,unspecified,Unspecified Category,unplanned
ZtqpfDh4BO4+t/hAVbz8EecRzFzEhzYFri9U6bw1m8k=,F2 BM 5,2023-04-01 22:22:07.560,2023-04-01 22:22:37.560,30000.0,Day,Waiting Downstream,Unspecified Category,unplanned
IF2+xDTTdB4GaVzVXvWDndcgFLp9RwFz2o1BIbsiE4s=,F2 BM 5,2023-04-01 22:20:10.450,2023-04-01 22:21:37.560,87110.0,Day,Waiting Downstream,Unspecified Category,unplanned
Sd2yMYP4gIpzRNVefvvHngQonhUbtIaSyDfVOIlc06A=,F2 BM 5,2023-04-01 22:18:26.230,2023-04-01 22:18:41.230,15000.0,Day,Waiting Downstream,Unspecified Category,unplanned


# Working with Parts

Whereas Cycles contain data happening on a particular machine, Parts track an object across multiple machines.  The general structure for query parts is similar for working with cycles, though slightly simpler.  With a Cycle, the pattern is to find the Machine Type, then the Machine, then get Cycle data associated with the machine.  With Parts, you only need a two step process to look up Part Types and then Part data.


In [None]:
part_types = cli.get_part_type_names()
part_type = part_types[0]
part_types

In [None]:
# look at parts schema, same as we did above for machine schema
columns = cli.get_part_schema(part_type)['display'].to_list()
columns[:10]

The options for querying parts are similar to querying for cycles - use the same operators described above.

In [None]:
query = {'Part': part_type,
         'End Time__gte' : datetime(2023, 4, 1), 
         'End Time__lte' : datetime(2023, 4, 2),
         'DefectReason__exists': True,
         '_limit': 10,
         '_only': columns[:30]}

df = cli.get_parts(**query)

print(f'Size of returned data: {df.shape}')
df.head()

# Machines and Machine Types
There is additional information about machines and machine types that can be queried from the SDK. This info can help you format or transform your queries to fit your needs. Examples are included below.

## Machine-Level Info

### Timezones

By default, all timestamps are in UTC.  To find the local timezone associated with a machine, use the ```get_machine_timezone``` function and provide the machine name.  This will then return the name of the timezone, which can be used with libraries such as pytz to convert time zones.

In [None]:
print(machines[0])
tz = cli.get_machine_timezone(machines[0])
print(tz)

### Get Machine Type from Machine
All machines can be grouped into machine types. You may need to programmatically look up the type of a machine given its name. To get the machine type using machine name (or display name), use ```cli.get_type_from_machine(machine_name)```.

In [None]:
cli.get_type_from_machine(machines[0])

### Get Machine Data Schema
The machine schema is a table containing metadata about the tags included in cycle data for a particular machine. It can be retrieved with ```cli.get_machine_schema(machine_name)```. There are a few additional optional parameters that can be passed to the function:
- ```types```: list of strings specifying a subset of column data types that you want to see.
    - ```cli.get_machine_schema(machine_name, types=['continuous'])```
- ```show_hidden```: (default = False) set to True to see the few additional fields that are hidden by default both here and in MA.

In [None]:
# retrieve the schema for a particular machine
schema = cli.get_machine_schema(machines[0])
schema.head()

In [None]:
# example: look at all the various data types for this model
print(schema['type'].unique())

In [None]:
# example: extract list of tags with numeric data types
schema_numeric = schema[schema['type'].apply(lambda x: x in ['int', 'float'])]
cols = schema['display'].to_list()
cols[:10]

In [None]:
# alternate method
schema_str = cli.get_machine_schema(machines[0], types=['continuous'])
schema_str.head()

## Machine-Type-Level Info

### Get Machine Type Schema (```get_fields_of_machine_type```)
This function is very similar to the above get_machine_schema, except it gets fields that are part of a machine type definition. This function has the same optional parameters as above:
- ```types```: list of strings specifying a subset of column data types that you want to see.
    - ```cli.get_machine_schema(machine_name, types=['continuous'])```
- ```show_hidden```: (default = False) set to True to see the few additional fields that are hidden by default both here and in MA.

In [None]:
type_dict = cli.get_fields_of_machine_type(types[0])
type_schema = pd.DataFrame(type_dict)
type_schema.head()