# An Introduction to Python for ArcGIS Pro

In this course, we'll be learning to extend and automate the functionality of ArcGIS Pro with the Python programming language. Programming can be an intimidating prospect if you've never done it before (or even if you have!), but it's an incredibly powerful and useful skill to have and well worth the effort it takes to learn. We'll start with a quick trip through the fundamentals of Python and work toward building a custom Script Tool. Along the way we'll learn how to use Python for a number of common ArcGIS Pro workflows. The topics we'll be covering are:

- Fundamentals of Python
- Using ArcPy to list and describe GIS datasets
-

## 1 Fundamentals of Python

To start, we'll need to know some Python basics. This is not meant to be a comprehensive introduction to the language, but it will be enough to get us started. We'll get more practice with these concepts as we go along. More information on Python basics can be found in the Additional Resources section at the end of this notebook.

### 1.1 Using this Notebook

We'll be using this Jupyter/ArcGIS Notebook for the duration of the course. There are certainly other ways to run Python (some of which are described in a PDF document in the same repository as this notebook), but this is great for many purposes and perfect for our current needs. 

ArcGIS/Jupyter notebooks are made up of cells that can contain Markdown text (what you're reading now) or code, and that the contents of cells can be executed by:
- Ctrl+Enter (run the current cell and keep it selected)
- Shift + Enter (run the current cell and move the selection to the next cell)
- Run button in top toolbar
- Cell -> Run Cells (or any of the other options in that menu)

We can just type an expression into a notebook cell, execute the cell, and see the expression evaluated as output. This is similar to the standard Python read-eval-print-loop (REPL). Try this below:

In [None]:
10 + 5

*Notice how a number appears in the brackets next to the cell you just ran. This tells you both that you've run the cell and what order you ran it in relative to other cells in the notebook.*

If you enter multiple lines of this kind of input into a cell, only the last line will be evaluated.

More often, you'll be using a cell to contain more standard code:

In [None]:
for i in [1,2,3]:
    j = 10
    k = i + j
    print(f"{i} plus {j} equals {k}")

Note that we're using the `print()` function to specify output in the above case. You can put values between the parentheses and they will output below the code cell once it's executed. Things like `print(5)` or `print("GIS is neat")` will work.

### 1.2 Comments

Comments are a way of adding lines of text to our code that won't execute. These can be a handy way of documenting our code and making it easier to understand. It's often best to use comments to describe parts of the code that are more difficult to immediately understand. Try to providing useful information without writing a novel!

Try executing the code in the cell below:

In [None]:
#The following code prints out 'Hello World'

print('Hello World')

See how none of the text following the # (pound sign/hashtag) was printed out? Anything after the # in a line of code is considered a comment by the Python interpreter. We can add a comment above a section of code, like we did above, or add it to a line of code. The example below shows both options:

In [None]:
#Convert feet to inches

f = 20 #measurement in feet
i = f*12 #measurement in inches
print (f"{f} feet is {i} inches")

Once again, nothing after the `#` had any effect on the code output. What if we want multiple lines of comments?

In [None]:
#This program reads data from an input text file and outputs summary metrics
#Version 1.0.0
#Septmeber 1st, 2023
#Contact Stephen Bond for assistance or updates

### 1.3 Variables

A variable is a name that is assigned to a value so that value can be stored in the computer's memory. The name essentially represents the value. For instance:

In [None]:
x = 17
print(x)

In the above example, we've assigned the value of `17` to the variable `x`. To assign a value to a variable, we use `=` (the equal sign, also know as an assignment token). `x = 17` is an example of an **assignment statement**. Assigning a value to a variable can be referred to as defining a variable or declaring a variable.

The variable must always be on the left side of the `=`. `17 = x` will not work. Either `x = 17` or `x=17` (no spaces around the `=`) will work, but the former is typically easier to read. 

Once we've assigned a value to a variable, we can use that variable in our code in place of the value. 

In a standard .py file, we can use a variable below where we assigned a value to it, within a given scope. The code runs top to bottom and we can't use a variable before we create it.

Notebooks are a little weird: They have an understanding of "top to bottom" within a single notebook cell, but cells are related to each other by the order in which they are run. So we can define a variable in one cell and use it in another cell higher up in the notebook as long as we run them in that order.

Let's take a look at this: What happens if we try to use a variable before we define it?

In [None]:
print (y)

y = "Success!"

What happens if we try the print statement again?

In [None]:
print(y)

Why did that happen?

Ok, how about we try defining a variable and then using in in the next cell down?

In [None]:
y = "Success!"

In [None]:
print(y)

That worked! Now let's try running the top example (where we tried to use a variable before defining it) again.

Why did that work?

Always keep this behavior in mind when using notebooks. It's best to keep everything well organized and run cells in order so you don't end up with variables having values you don't expect.


#### 1.3.1 Naming Variables

There are some rules for naming variables in Python:

- Don't start a name with a number. `1variable` will produce an error. It's perfectly acceptable to use numbers elsewhere, as in `variable1`

- Don't use a Python keyword. Python reserves some words for use in its syntax and they can't be used as variable names. The full list can be found at https://docs.python.org/3.9/reference/lexical_analysis.html#keywords , but some examples are:
    - True
    - False
    - and
    - or
    - if
    - try
    - with
    - for
    - def
- Don't use special characters. `variable$$$` will produce an error.

- Don't use spaces. `my variable` won't work.

Beyond the rules, there are some naming conventions you should observe. Violating these won't break your code but it will make it harder to read:

- Use underscores instead of camel case to clarify variable names: `my_variable` instead of `myVariable` (*Note*: Camel case is used when defining classes in Python, but not variables)

- Avoid starting variables with underscores unless you really mean to. `_variable` doesn't have a specific meaning in the Python syntax but Python coders typically use it to indicate something specific (a private function)

- Avoid using all caps for variable names. Similarly to the above convention, `VARIABLE` doesn't mean anything specific to Python, but it is generally used to indicate a global variable.

- Extremely long or confusing variable names like `incredibly_specific_variable_name_for_a_short_list` or `bLaBlaNonSeNse3000` should be avoided. The idea is to make your code as clear and readable as possible, as well as easy to write (typing long variable names over and over is a bummer).

- Try to make variables as descriptive as reasonably possible. If you're using a single input shapefile, `input_shapefile` or similar would work well.

As your Python expertise grows, you'll get more comfortable with what makes a good variable name. For now, keeping these rules and conventions in mind will get you pointed in the right direction.

### 1.4 Built-in Data Types

Variables can store different types of data. The built-in data types in Python that we're going to be talking about are:

- Number
    - Integer
    - Floating Point
- Boolean
- Text
    - String
- Sequence
    - List

There are others, but we'll talk about them if and when we need them.

One bit of terminology here: Any specific instance of one of these data types is often referred to as an **object**. This has specific meaning that we don't need to consider at this point. For now just it's fine to know that it's something with a specific type that you can do specific things with. If `x = "some text"`, we can say that x is a string object, for example.

You can check the type of any variable with the `type()` function:

In [None]:
my_list = [100, 200, 'Some Text']
type(my_list)

In [None]:
type(my_list[2])

#### 1.4.1 Numbers

We'll be dealing with the integer (int) and floating point (float) numeric data types. 

**Integers**
- Whole numbers (1, 2, 3, 4, 1,000,000, etc.)
- Effectively no limit on the size of an integer (other than system memory)
- *int* in Python


**Floating-Point Numbers**
- Numbers with a decimal point (1.0, 2.5, 20.111111111, etc)
- Can be represented in scientific notation (5.9e6 is the same as 5.9 x 10<sup>6</sup> is the same as 5,900,000)
- *float* in Python

We can use numbers for all of the things you'd expect, like math.
- `+` is used for addition. Ex: `2+2`
- `-` is used for subtraction. Ex: `10-5`
- `*` is used for multiplication. Ex: `4*6`
- `**` is used for exponentiation. Ex: `2**3`
- `/` is used for standard division. Ex: `10/5`

We can do this with integers:

In [None]:
2+2

In [None]:
10-5

In [None]:
4*6

... or with floats:

In [None]:
2.0+2.5

In [None]:
2.0*2.5

Mixing floats and integers will always result in a float:

In [None]:
2*2.5

Similarly, standard division will always result in a float, even when done on two integers:

In [None]:
5/2

Python includes two other division operators: floor division (`//`) and modulo (`%`). These can be very useful, but we won't need them in this course.

#### 1.4.2 Boolean Values

The Boolean (bool) data type can only have one of two values: True or False. These values will be extremely important for a lot of what we're going to do in this course.

Don't put quotes around True or False for a Boolean. That would make it a string.

In [None]:
print(type(True))
print(type(False))

Certain expressions, using **logical operators**, in Python will return Boolean values. For instance, we can use less-than or greater-than signs: 

In [None]:
x = 12
print(x<20)
print(x>20)

Less-than-or-equal-to and greater-than-or-equal-to can also be used:

In [None]:
print(x<=12)
print(x>=12)

We can also test if something is equal to something else. Remember that a single equal sign (`=`) is what we use to assign a value to a variable. Since that's already in use, we use 2 equal signs (`==`) to test equivalence:

In [None]:
print(x==20)
print(x==12)

`!=` means not equal to. It's the opposite of `==`.

In [None]:
print(x!=20)
print(x!=12)

#### 1.4.3 Strings

Text in Python is stored in **strings** (also known as string literals), which are sequences of characters. We've already seen a lot of these throughout this notebook. A string is defined by surrounding single or double quotation marks (or even triple quotes, which are typically only used in special circumstances), like so:

In [None]:
print("This is a string")
print('This is also a string')
print('''Yep, still a string''')

An interesting characteristic of strings defined with triple-quotes is that they can be broken up over many lines:

In [None]:
long_str = '''It is a period of civil war.
Rebel spaceships, striking
from a hidden base, have won
their first victory against
the evil Galactic Empire.
'''

print(long_str)

*Note: If you just want to break up a long string in your code for the sake of readability, but not necessarily print it to separate lines, you can do it with backslashes (`\`):*

In [None]:
long_string = "It is a period of civil war. \
Rebel spaceships, striking \
from a hidden base, have won \
their first victory against \
the evil Galactic Empire."

print(long_string)

Suppose we'd like to break up a line of text without using the triple quotes option. We can use something called an **escape sequence** (or escape character / escape character sequence). The newline escape sequence is `\n`. You can use it like this:

In [None]:
print("I want this text to be on one line\nI want this text to be on another")

There are other useful escape sequences, like tabs with `\t`:

In [None]:
print("These\twords\tare\tall\tseparated\twith\ttabs")

Because quotes are used to define strings, you can run into issues if you want your text to have quotes in it. We can get around this by using different types of quotes (single or double) than are used around the string, or by using a quote escape sequence (`\'` or `\"`):

In [None]:
print('This text has "quotes" in it')

print("I can add \'single quotes\' or \"double quotes\" to my string with escape sequences.")

You can assign strings to variables and even add them together. This is often called string **concatenation**:

In [None]:
first = "G"
second = "I"
third = "S"
print(first + second + third)

Here's an example where we add spaces between words stored as variables:

In [None]:
first = "GIS"
second = "is"
third = "better with Python"
print(first + " " + second + " " + third + ".")

Perhaps a better way to do this is through the use of **f-strings**. An f-string has the letter f before the quotation marks around the string, and contains variables enclosed in curly brackets:

In [None]:
fname = "Stephen"
lname = "Bond"

print(f"{fname} {lname} is teaching this course.")

You can also include expressions in an f-string:

In [None]:
print(f"There are {24*60*60} seconds in a day.")

#### 1.4.4 Lists

Lists are sequences of other objects. They have the following properties:

- Lists are ordered: The position of an object in a list matters
- Lists can contain any other type of object, and multiple types of objects
- Lists can be accessed by index
- Lists can be nested, so a list may contain other lists
- Lists are mutable, which means their elements can be changed once created
- Lists are dynamic, which means they can become larger or smaller once created

A list can be created by surrounding a set of objects with square brackets (`[]`) and separating them with commas. Here's a simple example of a list containing integers:

In [None]:
my_list = [1,2,3,4,5]
print(my_list)

Here's an example with a variety of data types (int, float, str, and bool):


In [None]:
my_list = [1, 2.0, "GIS", True]
print(my_list)

How about a nested list (list of lists)?

In [None]:
my_list = [[1,2,3],[4,5,6],[7,8,9]]
print(my_list)

Individual elements of a list are accessed through **indexing**. Each position in a list is numbered. Python uses zero-based indexing, meaning that the first position is numbered 0. Here's an example:

In [None]:
my_list = ['aardvark', 'bandicoot', 'coatimundi']
print(my_list[0])
print(my_list[1])
print(my_list[2])

Note that the final index position is one less than the length of the list because of zero-indexing (my_list has 3 elements, but the final index is 2).

You can also index backwards. Confusingly enough, the last position in a list is `[-1]`, the second to last is `[-2]`, and so on:

In [None]:
my_list = ['aardvark', 'bandicoot', 'coatimundi']
print(my_list[-1])
print(my_list[-2])
print(my_list[-3])

In [None]:
my_list = [[0,0,0],[0,0,1],[0,0,0]]
print(my_list[1][2])

In the above example, we're selecting from the main list (farthest outside brackets) first, then the sub-list. So, we're getting the value in index position 2 from the sub-list in index position 1.

We can also get more than one part of a list at once through a method called **string slicing**. In its most basic form this looks like `my_list[start:stop]`. The stop value is non-inclusive, meaning you need to specify the position after the one you actually want to end on:

In [None]:
river_list = ['Colorado', 'Guadalupe', 'Rio Grande', 'Nueces', 'Comal', 'Brazos']

#Get the first 3 rivers
print(river_list[0:3])

### 1.5 Using Functions and Methods

**Functions** typically take some input, known as an **argument**, and produce some result, or **return value**. We've seen one of these a lot so far: `print()` is a function. When we give it an argument, like `"Hello World"`, it prints that value out on the screen. `print("Hello World")` is known as a **function call**. Functions can have more than one argument or no arguments at all.

**Methods** are functions associated with a **class**. A class is a template for creating a specific kind of object.  `str`, `float`, `list`, and `dict` are all classes. `"Hello World"` is an **instance** of the `str` class. Classes have **attributes** and **methods**. Attributes are the data stored in a class and methods are the things you can do with that data. They differ from functions in that you add them to the end of a variable of their associated class. This may all sound very complicated, and it is, but the important takeaway for now is that when you see something that looks like `my_string.lower()`, you're looking at a method. This syntax is known as **dot notation**.

Let's look at a few common and useful examples:

In [None]:
len("Hello World")

The above example is a function that prints out the length of a string. This also works on lists:

In [None]:
my_list = [1,2,3,4,5]
len(my_list)

There are `upper` and `lower` string methods:

In [None]:
my_string = "Geographic Information Systems"

lower = my_string.lower()
upper = my_string.upper()

print(lower)
print(upper)

We can replace parts of a string with the `replace` method:

In [None]:
my_string = "counties.txt"

csv_file = my_string.replace("txt", "csv")
filename = my_string.replace(".txt", "")

print(csv_file)
print(filename)

Minimum and maximum values in a list, both numeric and alphabetical, can be determined:

In [None]:
my_list = [1,2,3,4,5]

print(min(my_list))
print(max(my_list))

In [None]:
my_list = ['coatimundi', 'aardvark', 'bandicoot']
print(min(my_list))
print(max(my_list))

The `sort` method will sort a list in-place. (*Note: In-place means that the original list is altered. Use the sorted function if you don't want this behavior*):

In [None]:
my_list.sort()
print(my_list)

The `round` function will round a number to a specified number of decimal places:

In [None]:
x = round(3.14159, 2)
print(x)

There are other useful functions that can be used to one data type into another. For instance, we can take the string representation of a number and convert it into a float value: 

In [None]:
str_num = "3.14"
flt_num = float(str_num)

print(str_num)
print(type(str_num))
print(flt_num)
print(type(flt_num))

We can convert floats to integers:

In [None]:
x = 2.5
int_x = int(x)

print(x)
print(type(x))
print(int_x)
print(type(int_x))

*Notice that this truncates the decimal portion of the float.*

... and vice versa:

In [None]:
x = 2
flt_x = float(x)

print(x)
print(type(x))
print(flt_x)
print(type(flt_x))

### 1.6 Conditionals

Conditionals let us selectively run portions of our code. We can have our code branch down different paths depending on conditions we specify. They work testing whether a statement is true or false and executing different code based on the results of that test. Here's a very simple example:

In [None]:
x = 2

if x > 5:
    print("The value of x is greater than 5")

Nothing happened. Why? *Try changing the value of x so output is generated.*

Note the syntax we're using here: The if statement ends with a colon, and the statement underneath that executes if the condition is met is indented. The if statement and the `print` line below it form a **compound statement**. It's all one instruction: if some condition is met, do a certain thing.

Also note the indentation under the if statement. In many programming languages, indenting lines is simply a stylistic convention intended to increase readability. **In Python, indentation matters**. Python uses spaces for indentation. It's standard to add four spaces for each level of indentation (4 for the first indentation level, 8 for the next, and so on) but any number of spaces will work. The Tab key in Jupyter (and many other programs) will automatically add 4 spaces, so there's no need to use the space bar for this.

Let's add some more code to make this more useful:

In [None]:
x = 2

if x > 5:
    print("The value of x is greater than 5")
else:
    print("The value of x is less than or equal to 5")

With the addition of `else` we've got code that will execute whenever the condition we've specified isn't met. *Once again, try changing the value of x to see how the if-else logic works.**

We can take this a bit further and add the `elif` clause:

In [None]:
x = 2

if x > 5:
    print("The value of x is greater than 5")
elif x == 5:
    print("The value of x is equal to 5")
else:
    print("The value of x is less than 5")

This adds another condition to be tested for. If the condition in the `if` clause is false, the code will move on to the `elif` clause. If that condition is also not met, the code in the `else` clause will be executed. Any number of `elif` clauses can be used. Only one `if` clause and one `else` clause can be used.

We can also nest our if statements, using multiple levels of indentation to indicate if statements that execute inside the context of another if statement:

*Note the use of the find method in this example. It searches a string for a specified substring and returns the first index position of that substring if found. If not found, it returns -1.*

In [None]:
animal = "black bear"

if animal.find("bear") != -1:
    print("This is a kind of bear. Here's what to do if it attacks:")
    if animal == "black bear":
        print("Fight back. Resist urge to cuddle.")
    elif animal == "brown bear":
        print("Play dead. Cuddling strongly discouraged.")
    elif animal == "polar bear":
        print("Things can't get any worse, so why not try a cuddle.")
    else:
        print("This is a bear I don't know about. I'm sorry for failing you.")
else:
    print("I don't know anything about this animal. Ask me about bears or leave me alone.")

### 1.7 Loops

Loops are a programming structure that allow you to repeat certain parts of your code until some condition is met or until all inputs into the loop are exhausted. The two types of loops available in Python are **while loops** and **for loops**.

#### 1.7.1 While Loops

While Loops continue to run until some condition is met:

In [None]:
i = 1

while i <= 10:
    print(f"This while loop has run {i} time(s).")
    i+=1

Notice the structure of the while loop: 1) Any code you want to run as part of the loop is indented, 2) The statement that initiates the while loop ends with a colon, much like an if statement, 3) The loop runs as long as the condition in the while statement is true, 4) A changing condition inside the loop prevents it from executing infinitely.

This is the first time we've seen something like `i+=1`. This is just a shortened version of `i = i+1`, which changes the value stored in the variable `i` to what it was plus 1. If i is 12, then `i+=1` will change it to 13. Every time the loop runs, i increments by 1. `i-=1` would work if you wanted to lower a number every time. 

We often combine loops and conditionals. Notice the second level of indentation:

In [None]:
i = 1

while i <= 10:
    if i == 1:
        print(f"This while loop has run {i} time.")
    if i > 1:
        print(f"This while loop has run {i} times.")
    i+=1

In general, we use while loops when the number of times it needs to execute is indefinite. For example, if we have some incoming data stream and want to stop receiving it once it reaches a certain value, but don't know how long that will take, we might want to use a while loop. Another common use for while loops is requesting user input until an acceptable value is entered. 

#### 1.7.2 For Loops

For loops are used when we know at the time we start the loop how many times we want to run through it:

In [None]:
for i in [1,2,3,4,5]:
    print(f"This for loop has run {i} time(s).")

Notice that the structure takes the form of `for <variable> in <object>`. For loops in Python always iterate over an object like a list, range, or even a string. From this, we can see that in some ways a string behaves like a list:

In [None]:
for i in "GIS":
    print(i)

The `range` function generates a range object that can be iterated over with a for loop:

In [None]:
for i in range(1,11):
    print(i)

It's also common to nest for loops:

In [None]:
for i in ["red", "orange", "green"]:
    for j in ["fruits", "vegtables"]:
        print(i + " " + j)

See how this works? Each element of the second for loop is iterated through before the first for loop moves to its next element. Let's try combining this idea with conditionals:

In [None]:
# print the coordiantes of a straight line that interesects (0,0) and (10,10)

for x in range(0,11):
    for y in range(0,11):
        if x==y:
            print(f"({x},{y})")

We can use the `range` and `len` functions to loop through two lists at once. We're really looping through index positions and then getting values out of each list based on that index:

In [None]:
cities = ["Austin", "Olympia", "Santa Fe"]
states = ["Texas", "Washington", "New Mexico"]
capitals = {}

for i in range(0,len(cities)):
    city = cities[i]
    state = states[i]
    print(f"{city} is the capital of {state}.")

That's enough fundamentals to get us started. Let's move on to applying all this to ArcGIS Pro!

## 2 Exploring and Describing GIS Datasets with ArcPy

Now that we've learned a bit about the fundamentals of the Python language we can get started with ArcPy. ArcPy is a Python **package**, which is a set of Python **modules**. Modules contain Python functions, classes, and variables that can be used by importing the module. They're like libraries of reusable code that provide access to tons of useful tools. 

There are numerous modules that are part of a Python installation by default. These make up what is known as the Python Standard Library (a link to documentation about this can be found at the bottom of this notebook). Many additional packages and modules are available through package repositories like PyPI or conda-forge (typically installed though *pip install* or *conda install*). Thankfully, we don't need to deal with any of this right now, as ArcPy and many other commonly used packages are already installed in the default ArcGIS Pro Python environment we're using.

So, to use ArcPy we'll just need to import it:

In [None]:
import arcpy

If we're working inside ArcGIS Pro, there isn't a need to import ArcPy, as it's already imported by default. However, it's good practice to include the `import arcpy` statement at the top of our code if there's any chance it will be used outside of ArcGIS Pro (from the command prompt, Jupyter, IDLE, or some other IDE for instance).

If we're not using the correct Python interpreter (the one that's installed with ArcGIS Pro), we'll get an error like this:

`ImportError: No module named arcpy`

To use ArcPy, we need to have an ArcGIS Pro license available. If we don't have one (if we haven't signed in to ArcGIS Pro, perhaps), we'll get an error like this:

`RuntimeError: NotInitialized`

### 2.1 Setting Our Workspace and Dealing With File Paths

One of the first things we want to do when using ArcPy is setting our workspace. This will be the location we retrieve data from and write data to, and most of the tools we use in our code will respect this setting.

Before we do this, we need to understand a few things about file paths in Python. Python scripts or notebooks have a crrent working directory. We can get this by importing the `os` module from the Python Standard Library and using the `getcwd` function:

In [None]:
import os
os.getcwd()

This should have printed the location that you're running this notebook from. We can reference file locations from this point, or as full file paths.

Notice that the path above uses two backslashes instead of the customary single backslash. This is because a single backslash can create an escape sequence and cause undesired behavior:

In [None]:
filepath = "C:\Temp\newproject\test"
print(filepath)

In the example above, the `\n` and `\t` create newline and tab characters, respectively. This isn't what we want for file paths. To get around this, we can use single forward slashes (this is standard for Mac and Linux):

In [None]:
filepath = "C:/Temp/GIS"
print(filepath)

... or the double backslashes seen above:

In [None]:
filepath = "C:\\Temp\\GIS"
print(filepath)

Another, perhaps more convenient option is to use a **raw string**. An `r` in front of your string indicates that it should be read as-is with no escape sequences:

In [None]:
filepath = r"C:\Temp\GIS"
print(filepath)

Any of the options above will work, but the raw string method is particularly great when we're copying a file path from Windows Explorer and want to just paste it directly into our code.

Now let's set our workspace:

In [None]:
arcpy.env.workspace = "Data"

The folder that contains this notebook also contains a Data folder. Since we can specify a file location relative to the current working directory, we only need to tell Python the name of this folder.

We can verify that this is the correct workspace location by checking to make sure a dataset that we know is inside that folder exists in our workspace. We can do this with `arcpy.Exists`:

In [None]:
if arcpy.Exists("watersheds.shp"):
    print("Your workspace is set correctly.")
else:
    print("Your workspace is wrong.")

#### 2.1.1 Other Environment Settings

There are many other environment settings we can set through ArcPy. using the dir() function can give us a list of the available options:

In [None]:
dir(arcpy.env)

Descriptions of all these environment setting can be found at: https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/env.htm

One commonly used environment setting is arcpy.env.overwriteOutput. If this is set to False, we'll get an error every time your script tries to create a file that already exists. We'll often have to manually delete any output feature classes or shapefiles before running the script again. This may be the desired behavior for our script, but it can sometimes be a pain when testing things out. Setting arcpy.env.overwriteOutput to True lets us run our code repeatedly without "already exists" errors:

In [None]:
arcpy.env.overwriteOutput = True

### 2.2 Describing Data

The various properties of different geospatial datasets will impact how we can use them in our scripts. ArcPy has two very similar ways of getting these properties from a geospatial data file. The first way is with `arcpy.Describe()`:

In [None]:
arcpy.Describe("watersheds.shp")

In a Notebook, we get all of the information about the dataset if we call this function. However, this is not the case if we use it in a .py script or from the command line. Even in a Notebook, we need a way to get a specific piece of this information. Using `arcpy.describe()` returns a Describe object:

In [None]:
desc = arcpy.Describe("watersheds.shp")
type(desc)

We can access various attributes of the dataset by using dot notation:

In [None]:
print(desc.shapeType)

Some information requires a little more work to get to. For instance, the spatial reference of attribute has its own name, well-known ID, and linear unit attributes.

**Try using arcpy.Describe() to print the coordinate system name for watersheds.shp**

In [None]:
# Try it here!







In [None]:
# Solution
desc = arcpy.Describe("watersheds.shp")
sr = desc.SpatialReference
print(sr.name)

We can get a great deal of useful information about a dataset's file location from `arcpy.Describe()`:

In [None]:
desc = arcpy.Describe("watersheds.shp")

print(f"dataType: ", desc.dataType)
print(f"extension: ", desc.extension)
print(f"path: ", desc.path)
print(f"catalogPath: ", desc.catalogPath)
print(f"file: ", desc.file)
print(f"baseName: ", desc.baseName)

### 2.3 Listing Data

Getting a list of what data is available in our workspace is an important part of many AcrPy-based scripts. `arcpy.ListFeatureClasses()` is the tool we use for this:

In [None]:
arcpy.ListFeatureClasses()

Let's go ahead and change our workspace to be the geodatabase in our data folder. The code below will make that switch and list the data in the new workspace. Only run this once:

In [None]:
import os
current_ws = arcpy.env.workspace
arcpy.env.workspace = os.path.join(current_ws, "DC.gdb")
fclist = arcpy.ListFeatureClasses()
print(fclist)

This is a list of all the feature classes in the "DC" geodatabase. We can narrow things down a bit further with a wildcard search...

In [None]:
fclist = arcpy.ListFeatureClasses("b*")
print(fclist)

...or by feature type:

*Note: feature_type is the second optional parameter for ListFeatureClasses() so we need to pass in an empty parameter for wild_card or identify feature type explicitly as the optional parameter.*

In [None]:
fclist = arcpy.ListFeatureClasses("", "point")
print(fclist)

`arcpy.ListRasters()` and `arcpy.ListFiles()` work similarly. The former will list all rasters in a workspace and the latter will list all files in a workspace, with output similar to os.listdir().

`arcpy.ListWorkspaces()` will list all workspaces in your workspace. This may sound strange at first, but if you were to define a folder as your workspace, `arcpy.ListWorkspaces()`will return any subfolders or geodatabases contained in that folder.

`arcpy.ListDatasets()` will return any datasets in a workspace.This is typically used for Feature Datasets, but will also return geometric networks, networks, raster catalogs, topologies, and others: 

In [None]:
arcpy.ListDatasets()

The last method for listing geospatial data we'll look at is `arcpy.da.Walk()`. The `da` means that the `Walk` function is contained within ArcPy's Data Access module. Let's switch back to our original workspace for this:

In [None]:
arcpy.env.workspace = "Data"

walk = arcpy.da.Walk()
for dirpath, dirnames, filenames in walk:
    print(dirpath, dirnames, filenames)

This syntax is bit convoluted, but we're essentially walking down through the file structure of our workspace. In this case, `dirpath` is the file path we're listing files and directories/geodatabases/datasets in. `dirnames` is a list of directories inside our `dirpath`, and filenames is a list of files in that path. 

We start with our workspace as the first `dirpath` and see that it has one `dirname` ("DC.gdb") and several `filenames` (bike_routes.shp, etc.) in it. Then we move into "DC.gdb" and see that it includes the "Transportation" `dirname` and several `filenames` (neighborhoods, etc). We keep going deeper like this until we reach the bottom (each `dirpath` contains no `dirnames`).

We can loop through variables we use in `arcpy.da.Walk()` to print a list of of every file contained in a directory and its subdirectories:

In [None]:
import os
walk = arcpy.da.Walk()
for dirpath, dirnames, filenames in walk:
    for file in filenames:
        print(os.path.join(dirpath, file))

#### 2.3.1 Looping Through Lists of Geospatial Data

Since ArcPy data listing functions like `arcpy.ListFeatureClasses()` return a list, we can use for loops to run some process on every feature class in a workspace. The code below will get the shapefiles from our workspace and copy them into the geodatabase that's also in our workspace:

In [None]:
outgdb = arcpy.ListWorkspaces()[0]
print(f"Geodatabase path is: {outgdb}")

fcs = arcpy.ListFeatureClasses(outgdb)
print(fcs)

for fc in fcs:
    desc = arcpy.da.Describe(fc)
    print(desc.baseName)

#### 2.3.2 Listing Fields

Listing the fields present in a feature class works a lot like listing the feature classes in a workspace, with few important differences:

In [None]:
fields = arcpy.ListFields("watersheds.shp")
print(fields)

Instead of returning a path, `arcpy.ListFields()` returns a list of Field objects. We need to do a little more work to access the data contained in these objects:

In [None]:
fields = arcpy.ListFields("watersheds.shp")
for field in fields:
    print(f"{field.name} is a field of type {field.type} with a length of {field.length}")

It's fairly common to need a list of field names in a Python script. Since `arcpy.ListFields()` returns a list of field objects, this has to be built manually.

**Try creating a list of all the field names for the public_schools feature class inside DC.gdb**

In [None]:
# Try it here!

# Hint: the append method can be used to add an item to a list. If you have a list named "field_names",
# you can add an item "i" to it with "field_names.append(i)"

field_names = []

In [None]:
# Solution

field_names = []

for f in arcpy.ListFields("DC.gdb/public_schools"):
    field_names.append(f.name)
    
print(field_names)

### 2.4 Spatial Reference Objects

We'll soon learn how to get the coordinate system from a spatial dataset, or assign a new one, but let's start out by interacting with Spatial Reference objects. To start, we can find the Spatial Reference defined in a projection file (part of a shapefile):

*Note: When dealing with a projection file like this, the workspace environment setting isn't respected, so we need to use os.path.join to glue our workspace and projection file into a single usable file path.*

In [None]:
import os
prjfile = os.path.join(arcpy.env.workspace, "Texas.prj")
spatialref = arcpy.SpatialReference(prjfile)
print(spatialref.name)

We can use this to create a new, empty spatial dataset that uses the same projection:

In [None]:
prjfile = os.path.join(arcpy.env.workspace, "Texas.prj")
spatialref = arcpy.SpatialReference(prjfile)
out_path = prjfile = arcpy.env.workspace
out_name = "empty_pt.shp"
arcpy.management.CreateFeatureclass(out_path, out_name, "POINT", spatial_reference=spatialref)

We can directly use the name of a coordinate system to create a Spatial Reference object:

In [None]:
sr = arcpy.SpatialReference("NAD 1983 StatePlane Texas Central FIPS 4203 (US Feet)")

print(sr.name)

In practice this can be a bit cumbersome, so it's common to work with a projection's Well-Known ID (WKID). We can get the WKID from an existing Spatial Reference object by accessing its Factory Code attribute:

In [None]:
print(sr.factoryCode)

We can also set create a Spatial Reference Object using the WKID:

In [None]:
sr = arcpy.SpatialReference(2277)
print(sr.name)

We can get a coordinate system's type (projected or geographic) and units of measure from a Spatial Reference object:

In [None]:
print(sr.type)
print(sr.linearUnitName)

ArcPy also allows us to search for coordinate systems using `arcpy.ListSpatialReferences()`:

In [None]:
arcpy.ListSpatialReferences("*Texas Central*", "PCS")

The first argument in `arcpy.ListSpatialReferences()` is a wildcard search string. In the above example, we'll get any coordinate system name that contains the string "Texas Central". The optional second argument specifies the type of coordinate system we're looking for: "GCS" for geographic, and "PCS" for projected.

We can use some set logic to narrow our search if we need to get more specific:

In [None]:
srs_tx = arcpy.ListSpatialReferences("*Texas Central*")
srs_sp = arcpy.ListSpatialReferences("*State Plane*")
srs_ft = arcpy.ListSpatialReferences("*Feet*")

srs = set(srs_tx) & set(srs_sp) & set(srs_ft)

for sr in srs:
    print(sr)

Information on all coordinate systems available in ArcGIS Pro can be found in two PDFs:

- Projected: https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/pdf/projected_coordinate_systems.pdf
- Geographic: https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/pdf/geographic_coordinate_systems.pdf

## 3 Geoprocessing and Data Manipulation with Arcpy

Now that we know a bit about accessing geospatial datasets with Python and ArcPy, it's time to take a look at how we actually can work those datasets. ArcPy acts as a wrapper around the geoprocessing tools we're used to using in ArcGIS Pro, giving us access to them in our code. We can also use cursors to access and edit our geospatial data at the feature level. These are both powerful tools for automating data analysis and management workflows.

### 3.1 Using Geoprocessing Tools

ArcPy gives us access to all of the geoprocessing tools available in ArcGIS Pro. Tools can be run from Python in two similar ways, through functions and through toolbox modules. Let's try running the Clip tool as a function:

*Make sure your workspace is set to the location of your demo files*

In [None]:
arcpy.Clip_analysis("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")

The syntax for calling a tool by it's function is `arcpy.<toolname_toolboxalias)(<parameters>)`.

Alternately, we can call a tool from its associated toolbox module. The syntax for this follows the form `arcpy.<toolboxalias>.(toolname>(<parameters>)`. For the the clip tool, this looks like:

In [None]:
arcpy.analysis.Clip("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")

Either of these methods is correct.

Getting the parameter syntax correct is key to successfully running a geoprocessing tool. Every tool has a set of parameters it requires to function. We can find these by checking the online documentation for the tool. Let's take a look at the web page for the Clip tool:

https://pro.arcgis.com/en/pro-app/latest/tool-reference/analysis/clip.htm

Toward the bottom of the page, there is a "Parameters" section. If we click on the Python tab on the tabel in that section, we'll get a general idea about the required syntax. In this case, it looks like:

`arcpy.analysis.Clip(in_features, clip_features, out_feature_class, {cluster_tolerance})`

From this we can see that when we ran the Clip tool above, our parameters were:
- in_features = "USA_Major_Cities.shp"
- clip_features = "Texas.shp"
- out_feature_class = "Texas_Cities.shp"

Notice that the bottom of the Clip tool documentation contains code samples to demonstrate use of the tool in Python. If you're ever confused about how to make a particular geoprocessing tool work, a quick web search should give you access to the tool's official documentation.

It's also worth noting that hitting the Tab key in Jupyter/ArcGIS notebook can help with code completion as well, though it won't give us much in the way of useful parameter syntax information. 

**Try finding information on the syntax for the Buffer tool, and run it on the Texas_Cities shapefile with a buffer distance of 50 miles.**

In [None]:
# Try it here!





In [None]:
# Solution

arcpy.analysis.Buffer("Texas_Cities.shp", "Cities_50mi_buffer", "50 MILES")

A useful technique when working with geoprocessing tools is to define variables for our tool parameters. This lets us reuse parameters throughout our code and keep our geoprocessing tool calls short and easy to read. Variable names don't have to match tool parameter names, but they should be given meaningful names that make our code relatively easy to interpret:

In [None]:
infc =  "USA_Major_Cities.shp"
clipfc = "Texas.shp"
outfc = "Texas_Cities.shp"
arcpy.analysis.Clip(infc, clipfc, outfc)

#### 3.1.1 Result Objects



ArcPy returns the output of a geoprocessing tool as a Result object. In the case where a tool outputs or modifies a feature class or shapefile, the Result object will contain the path to that output file. Other tools may produce a Result object that contains a sting, Boolean, or number. In the first example below, the `GetCount` geoprocessing tool returns the number of records in a specified dataset:

In [None]:
cities_count = arcpy.management.GetCount("USA_Major_Cities.shp")
print(type(cities_count))
print(cities_count)

The Clip tool produces a new feature class of shapefile, so its Result object is a file path:

In [None]:
tx_cities = arcpy.analysis.Clip("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")
print(tx_cities)

The Result object from one tool can be used as an input to another tool, which allows us to conveniently string together geoprocessing tools for complex workflows. In the example below, we use Clip to get the major cities in Texas, and then use GetCount on the output to count those cities:

In [None]:
tx_cities = arcpy.analysis.Clip("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")
tx_city_count = arcpy.management.GetCount(tx_cities)
print(tx_city_count)

**Try using this method to Buffer the state of Texas by 100 miles and Clip USA_Major_Cities to the result**

In [None]:
# Try it here!





In [None]:
# Soution

tx_buffer = arcpy.analysis.Buffer("Texas.shp", "Texas_100mi_buffer", "100 MILES")
arcpy.analysis.Clip("USA_Major_Cities.shp", tx_buffer, "Cities_100mi")

#### 3.1.2 More on Tool Parameters

As we've seen, the syntax for the Buffer geoprocessing tool looks like this:

`arcpy.analysis.Buffer(in_features, out_feature_class, buffer_distance_or_field, {line_side}, {line_end_type}, {dissolve_option}, {dissolve_field}, {method})`

The parameters in the curly brackets are optional, meaning the tool will still run if they're not specified. We run into a conundrum if we want to specify some of these optional parameters, but not all of them. For instance, if we want to dissolve all buffer polygons, we need to use the `dissolve_option` parameter:

In [None]:
arcpy.analysis.Buffer("Texas_Cities.shp", "Cities_50mi_buffer", "50 MILES", "ALL")

This gives an error because Python thinks we're trying to specify that the `line_side` parameter is "ALL". Without some additional syntax, parameters have to be passed into a function in the correct order. We have the option of skipping the optional parameters we don't want to use:

In [None]:
arcpy.analysis.Buffer("Texas_Cities.shp", "Cities_50mi_buffer_dissolve", "50 MILES", "", "", "ALL")

or:

In [None]:
arcpy.analysis.Buffer("Texas_Cities.shp", "Cities_50mi_buffer_dissolve", "50 MILES", None, None, "ALL")

We can also explicitly specify which parameter we're setting:

arcpy.analysis.Buffer("Texas_Cities.shp", "Cities_50mi_buffer_dissolve", "50 MILES", dissolve_option = "ALL")

#### 3.1.3 Tool Messages

When we run geoprocessing tools in ArcGIS Pro (or, as we've seen, in a Notebook), we typically get a number of messages from the tool:

If we run a geoprocessing tool from a Python script, we'll get an error message if the tool fails to run, but we won't get anything else without some additional code. We can use `arcpy.GetMessages()` to get the messages from the most recently run tool:

In [None]:
arcpy.Clip_analysis("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")
print(arcpy.GetMessages())

We can also get individual pieces of a tool message. `arcpy.GetMessage(0)` will return the first tool message:

In [None]:
print(arcpy.GetMessage(0))

`arcpy.GetMessageCount()` will return the number of messages a tool produces, and we can use this to get only the final message:

In [None]:
count = arcpy.GetMessageCount()
print(arcpy.GetMessage(count-1))

Tool messages have a severity property. There are 3 severity levels:
- Severity 0: Information about tool execution
- Severity 1: Warning message. This doesn't prevent a tool from running, but may be worth investigating.
- Severity 2: Error that prevents a tool from running

We can use `arcpy.GetMaxSeverity()` to return the highest message severity for a tool run. If we do this for a successful run with nothing unexpected happening, we can see that the max severity is 0:

In [None]:
arcpy.Clip_analysis("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")
print(arcpy.GetMaxSeverity())

If we run the Clip tool with an incorrect input parameter, we can see that the maximum severity is 2:

*Note: we need to use try and except here because otherwise the code will stop executing after the Clip tool fails and we'll never get to run arcpy.GetMaxSeverity().*

In [None]:
try:
    arcpy.Clip_analysis("error.shp", "Texas.shp", "Texas_Cities.shp")
except:
    print(arcpy.GetMaxSeverity())

Lastly, if we want to get tool messages from something other than the most recent tool run, we can use Result objects:

In [None]:
texas_cities = arcpy.Clip_analysis("USA_Major_Cities.shp", "Texas.shp", "Texas_Cities.shp")
cities_count = arcpy.management.GetCount("USA_Major_Cities.shp")

print(texas_cities.getMessages())

`getMessage()` will also work here:

count = texas_cities.messageCount
print(texas_cities.getMessage(count-1))

Note the subtley different syntax used here. When we're using a function to get the messages from the last tool run, it's `arcpy.GetMessages()`. When we're using the method associated with a Result object, it's `<result_object>.getMessages()`.

*Note: `arcpy.GetMessages` and `arcpy.GetMessageSeverity` are examples of ArcPy functions that are not geoprocessing tools. So are `arcpy.Describe` and arcpy.Exists`. These will always be called using syntax like `arcpy.<function_name>(<parameters>)`. All of the other functions available in ArcPy can be found here: https://pro.arcgis.com/en/pro-app/latest/arcpy/functions/alphabetical-list-of-arcpy-functions.htm*

### 3.3 Cursors

We can access individual rows of a geospataial dataset through the use of **cursors**. Cursor is a database-related term that refers to a way to access a specific set of rows in a table. In ArcPy, they give us the ability to search for and manipulate specific rows (or features) in geospatial dataset. We can use a **search cursor** to retrieve the rows of a feature class: 

In [None]:
cursor = arcpy.da.SearchCursor("Corridors", "corridor_n")

for row in cursor:
    print(row)

The syntax for `arcpy.da.SearchCursor()` is: 

`SearchCursor (in_table, field_names, {where_clause}, {spatial_reference}, {explode_to_points}, {sql_clause}, {datum_transformation})`

The two required parameters are `in_table`, which is the dataset we're trying to get information from, and `field_names`, which is the field or fields in the dataset that will be returned by the cursor. If we only want to see a single field (as in the example above), we input that field name as a string. If we want to see more than one field, those field names should be passed into the cursor as a list or tuple of strings. The data that are returned can be accessed just like the elements of any other list.

Much like we've seen with other files, we can use a context manager with a cursor. This ensures that the file is properly closed, data locks are released, and iteration is reset to the top of the file:

In [None]:
with arcpy.da.SearchCursor("Corridors", "corridor_n") as cursor:
    for row in cursor:
        print(f"Corridor Name: {row[0]}")
    

There are two kinds of data locks used by ArcGIS Pro: **shared locks** and **exclusive locks**. Shared locks are placed on a file when whenever it's accessed. They don't prevent another user or process from accessing the dataset as well, but they do prevent an exclusive lock from being placed on the file. Exclusive locks are applied whenever edits are being made to a dataset. These ensure that no other process can make changes to the the file at the same time. To make sure locks are released when we're done with a cursor, we can either use a context manager (`with` - shown above) or use the `del` keyword to delete the cursor: 

In [None]:
cursor = arcpy.da.SearchCursor("Corridors", "corridor_n")
for row in cursor:
    print(f"Corridor Name: {row[0]}")
del cursor

An **insert cursor** adds new rows to the bottom of the dataset. The basic syntax of `arcpy.da.InsertCursor()` is similar to that of `arcpy.da.SearchCursor()`, but it uses less optional parameters:

`InsertCursor (in_table, field_names, {datum_transformation}, {explicit})`

The example below uses the `insertRow()` method to add a new row to the bottom of a dataset, with specified fields populated:

In [None]:
with arcpy.da.InsertCursor("Corridors", "corridor_n") as cursor:
    cursor.insertRow(["NEW CORRIDOR"])

We can use a while loop to add more than one row at a time:

In [None]:
with arcpy.da.InsertCursor("Corridors", "corridor_n") as cursor:
    x = 1
    while x <= 5:
        cursor.insertRow(["NEW CORRIDOR"])
        x+=1

It's important to note that while we're adding rows to the feature class's table, we haven't populated them with any sort of geometry. We will learn how to do that in the next lecture.

An **update cursor** allows us to make changes to existing rows in a dataset through the `updateRow()` method:

In [None]:
with arcpy.da.UpdateCursor("Centers", "modified_b") as cursor:
    for row in cursor:
        row[0] = "Stephen B"
        cursor.updateRow(row)

Update cursors can also be used to delete rows in a dataset using the `deleteRow()` method:

In [None]:
with arcpy.da.UpdateCursor("Centers", "center_nam") as cursor:
    for row in cursor:
        if row[0] == "Rio di Vida":
            cursor.deleteRow()

#### 3.3.1 Where Clause

We can use the optional `where_clause` parameter available in the insert and update cursors to specify a subset of rows to return. This parameter is passed into the cursor as a SQL (structure query language) statement. SQL is used to perform operations on databases:

In [None]:
with arcpy.da.SearchCursor("Centers", ["center_nam", "center_typ"], """"center_typ" = 'Job Center'""") as cursor:
    for row in cursor:
        print(f"{row[0]} is a {row[1]}")

The syntax for `where_clause` probably looks pretty strange. Fields in the `where_clause` (center_typ in this case) must typically be delimited by double quotes and the string we're searching for (Job Center in this case) must be delimited by single quotes. On top of this, the entire SQL statement must be passed into the cursor as a string. This leaves us with a few options. First, we can wrap the SQL statement in triple-quotes, as we did above. Second, we can use escape sequences for the double quotes we've got to include around the field name, like so:

In [None]:
with arcpy.da.SearchCursor("Centers", ["center_nam", "center_typ"], "\"center_typ\" = 'Job Center'") as cursor:
    for row in cursor:
        print(f"{row[0]} is a {row[1]}")

Finally, we can use `arcpy.AddFieldDelimiters()`. This function wraps the field name in the proper delimiters, and we can assign it to a variable that we pass into the cursor:

In [None]:
fieldname = "center_typ"
fc = "Centers"
delimfield = arcpy.AddFieldDelimiters(fc, fieldname)
sqlwhere = delimfield + " = 'Job Center'"

with arcpy.da.SearchCursor(fc, ["center_nam", "center_typ"], sqlwhere) as cursor:
    for row in cursor:
        print(f"{row[0]} is a {row[1]}")

Doing it this way may seem like a bit more work, but it does have an added benefit: While fields from shapefiles and feature classes in file geodatabases must be delimited by double-quotes, we don't use any delimiters when dealing with fields from enterprise geodatabases. `arcpy.AddFieldDelimiters()` ensures that we're using the proper delimiters no matter what sort of dataset we're dealing with.

#### 3.3.2 SQL Clause

The search and update cursors also have `sql_clause` as an optional parameter. This parameter lets us do a few other SQL operations like sorting and returning only unique values:

In [None]:
fc = "Centers"
fields = ["center_nam", "center_typ"]
sql = (None, "ORDER BY center_nam")

with arcpy.da.SearchCursor(fc, fields, sql_clause = sql) as cursor:
    for row in cursor:
        print(f"{row[0]} is a {row[1]}")

The `sql_clause` is a tuple of two values: a prefix and a postfix clause. Theses classifications are based on positions in typical SQL statements. `ORDER BY` is a typical postfix clause and `DISTINCT` is a typical prefix clause. `ORDER BY` sorts our results, as we saw above. `DISTINCT` returns only unique values:

In [None]:
fc = "Centers"
fields = ["center_typ"]
sql = ("DISTINCT", "ORDER BY center_typ")

with arcpy.da.SearchCursor(fc, fields, sql_clause = sql) as cursor:
    for row in cursor:
        print(f"{row[0]}")

#### 3.3.3 Table and Field Names

It's often a good idea to make sure that any new dataset or field we create has a valid and unique name. Arcpy has several functions for doing this. `arcpy.ValidateTableName()` takes a table name as input and returns a modified version of it if it isn't valid. It will just return the input table name if it's already valid:

In [None]:
arcpy.ValidateTableName("new feature class")

The example name used above includes spaces. Spaces aren't allowed, so the `arcpy.ValidateTableName()` function fills them with underscores. 

A workspace can be passed into `arcpy.ValidateTableName()` as an optional parameter if the workspace we're interested in creating a new dataset in isn't the current workspace. This is useful when we're reading in data from one workspace and writing out to another.

`arcpy.ValidateFieldName()` works similarly, but for field names:

In [None]:
fc = "Centers"

fieldname = arcpy.ValidateTableName("new&field*")
print(fieldname)

Once again, invalid characters are replaced with underscores.

Neither of these functions will check for the existence of datasets or fields with the same name, so it's entirely possible for our script to run into an error or overwrite existing data without an additional check for uniqueness. One way to do this is to make a list of fields or feature classes and make sure our name isn't in it: 

In [None]:
fc = "Centers"
fieldname = "created_by"
fieldlist = []
for field in arcpy.ListFields(fc):
    fieldlist.append(field.name)
    
if fieldname in fieldlist:
    print("Error: field name already exists in dataset")
else:
    print("Field name is unique")

The technique above is our only option for field names. For dataset names, we can use `arcpy.CreateUniqueName()`. This function will append a number to the end of a proposed name if it already exists in our workspace, or do nothing if our proposed name doesn't exist:

In [None]:
arcpy.CreateUniqueName("Centers")

In [None]:
We'll soon learn how to get the coordinate system from a spatial dataset, or assign a new one, but let's start out by interacting with Spatial Reference objects. To start, we can find the Spatial Reference defined in a projection file (part of a shapefile):

Note: When dealing with a projection file like this, the workspace environment setting isn't respected, so we need to use os.path.join to glue our workspace and projection file into a single usable file path.

import os

prjfile = os.path.join(arcpy.env.workspace, "Texas.prj")

spatialref = arcpy.SpatialReference(prjfile)

print(spatialref.name)

We can use this to create a new, empty spatial dataset that uses the same projection:

prjfile = os.path.join(arcpy.env.workspace, "Texas.prj")

spatialref = arcpy.SpatialReference(prjfile)

out_path = prjfile = arcpy.env.workspace

out_name = "empty_pt.shp"

arcpy.management.CreateFeatureclass(out_path, out_name, "POINT", spatial_reference=spatialref)

We can directly use the name of a coordinate system to create a Spatial Reference object:

sr = arcpy.SpatialReference("NAD 1983 StatePlane Texas Central FIPS 4203 (US Feet)")

​

print(sr.name)

In practice this can be a bit cumbersome, so it's common to work with a projection's Well-Known ID (WKID). We can get the WKID from an existing Spatial Reference object by accessing its Factory Code attribute:

print(sr.factoryCode)

We can also set create a Spatial Reference Object using the WKID:

sr = arcpy.SpatialReference(2277)

print(sr.name)

We can get a coordinate system's type (projected or geographic) and units of measure from a Spatial Reference object:

print(sr.type)

print(sr.linearUnitName)

ArcPy also allows us to search for coordinate systems using arcpy.ListSpatialReferences():

arcpy.ListSpatialReferences("*Texas Central*", "PCS")

The first argument in arcpy.ListSpatialReferences() is a wildcard search string. In the above example, we'll get any coordinate system name that contains the string "Texas Central". The optional second argument specifies the type of coordinate system we're looking for: "GCS" for geographic, and "PCS" for projected.

We can use some set logic to narrow our search if we need to get more specific:

srs_tx = arcpy.ListSpatialReferences("*Texas Central*")

srs_sp = arcpy.ListSpatialReferences("*State Plane*")

srs_ft = arcpy.ListSpatialReferences("*Feet*")

​

srs = set(srs_tx) & set(srs_sp) & set(srs_ft)

​

for sr in srs:

    print(sr)

## 4 Creating Script Tools

## Resources

https://docs.python.org/3/library/index.html