Skip to content

Conversation

@sourcery-ai
Copy link

@sourcery-ai sourcery-ai bot commented Apr 6, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from pruthvirajg April 6, 2022 09:25
Comment on lines -193 to -198
if years_experience < 3.0:
if years_experience < 3.0 or years_experience >= 8.5:
return "paid"
elif years_experience < 8.5:
return "unpaid"
else:
return "paid"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function predict_paid_or_unpaid refactored with the following changes:

def load_data(filepath):
data = pd.read_csv(filepath)
return data
return pd.read_csv(filepath)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_data refactored with the following changes:

Comment on lines -8 to +11
dataset = list()
dataset = []
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)

dataset.extend(row for row in csv_reader if row)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_csv refactored with the following changes:

Comment on lines -29 to +25
lookup = dict()

for i, value in enumerate(unique):
lookup[value] = i
lookup = {value: i for i, value in enumerate(unique)}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function str_columm_to_int refactored with the following changes:

Comment on lines -42 to +40
dataset_split = list()
dataset_split = []
dataset_copy = list(dataset)
fold_size = int(len(dataset) / k_folds)

for i in range(k_folds):
fold = list()
for _ in range(k_folds):
fold = []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function cross_validation_split refactored with the following changes:

Comment on lines -53 to +56
else:
# if even, return the average of the middle values
lo = midpoint - 1
hi = midpoint
return (sorted_v[lo] + sorted_v[hi]) / 2
# if even, return the average of the middle values
lo = midpoint - 1
hi = midpoint
return (sorted_v[lo] + sorted_v[hi]) / 2
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function median refactored with the following changes:

i_points = [p for p, a in zip(inputs, assignments) if a == i]

if i_points:
if i_points := [p for p, a in zip(inputs, assignments) if a == i]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function KMeans.train refactored with the following changes:

Comment on lines -107 to +105
if is_leaf(cluster):
return float('inf')
else:
return cluster[0]
return float('inf') if is_leaf(cluster) else cluster[0]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_merge_order refactored with the following changes:

Comment on lines -115 to +126
clusters = [input for input in inputs]
clusters = list(inputs)

# as long as we have more than one cluster left...
while len(clusters) > 1:
# find the two closest clusters
c1, c2 = min([(cluster1, cluster2)
for i, cluster1 in enumerate(clusters)
for cluster2 in clusters[:i]],
key=lambda p: cluster_distance(p[0], p[1], distance_agg))
c1, c2 = min(
(
(cluster1, cluster2)
for i, cluster1 in enumerate(clusters)
for cluster2 in clusters[:i]
),
key=lambda p: cluster_distance(p[0], p[1], distance_agg),
)


# remove them from the list of clusters
clusters = [c for c in clusters if c != c1 and c != c2]
clusters = [c for c in clusters if c not in [c1, c2]]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function bottom_up_cluster refactored with the following changes:

return winner # unique winner, so return it
else:
return majority_vote(labels[:-1]) # try again without the farthest
return winner if num_winners == 1 else majority_vote(labels[:-1])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function majority_vote refactored with the following changes:

This removes the following comments ( why? ):

# unique winner, so return it
# try again without the farthest

Comment on lines -94 to +91
plt.title(str(k) + "-Nearest Neighbor Programming Languages")
plt.title(f'{str(k)}-Nearest Neighbor Programming Languages')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function classify_and_plot_grid refactored with the following changes:


def estimate_beta(x, y):
beta_initial = [random.random() for x_i in x[0]]
beta_initial = [random.random() for _ in x[0]]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function estimate_beta refactored with the following changes:

"""use gradient descent to fit a ridge regression
with penalty alpha"""
beta_initial = [random.random() for x_i in x[0]]
beta_initial = [random.random() for _ in x[0]]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function estimate_beta_ridge refactored with the following changes:

return random.randrange(1, y)
else:
return random.randrange(y - 6, 7)
return random.randrange(1, y) if y <= 7 else random.randrange(y - 6, 7)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function random_x_given_y refactored with the following changes:

Comment on lines -174 to +171
distinct_words = set(word
for document in documents
for word in document)
distinct_words = {word for document in documents for word in document}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 174-176 refactored with the following changes:

Comment on lines -206 to +204
document_topics = [[random.randrange(K) for word in document]
for document in documents]
document_topics = [
[random.randrange(K) for _ in document] for document in documents
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 206-215 refactored with the following changes:

return try_or_none(parser)(value)
else:
return value
return try_or_none(parser)(value) if parser is not None else value
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function try_parse_field refactored with the following changes:

@sourcery-ai
Copy link
Author

sourcery-ai bot commented Apr 6, 2022

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.12%.

Quality metrics Before After Change
Complexity 2.68 ⭐ 2.59 ⭐ -0.09 👍
Method Length 37.44 ⭐ 37.24 ⭐ -0.20 👍
Working memory 6.33 🙂 6.30 🙂 -0.03 👍
Quality 80.38% 80.50% 0.12% 👍
Other metrics Before After Change
Lines 1571 1548 -23
Changed files Quality Before Quality After Quality Change
friendster_network.py 87.86% ⭐ 87.81% ⭐ -0.05% 👎
hparams_grid_search_keras_nn.py 65.54% 🙂 65.04% 🙂 -0.50% 👎
sonar_clf_rf.py 73.60% 🙂 73.49% 🙂 -0.11% 👎
helpers/gradient_descent.py 79.84% ⭐ 80.27% ⭐ 0.43% 👍
helpers/probabilty.py 75.54% ⭐ 75.67% ⭐ 0.13% 👍
helpers/stats.py 84.13% ⭐ 84.80% ⭐ 0.67% 👍
k_means_clustering/utils.py 81.74% ⭐ 81.69% ⭐ -0.05% 👎
k_nearest_neighbors/utils.py 75.29% ⭐ 75.16% ⭐ -0.13% 👎
multiple_regression/utils.py 91.81% ⭐ 92.17% ⭐ 0.36% 👍
natural_language_processing/utils.py 82.96% ⭐ 82.83% ⭐ -0.13% 👎
working_with_data/utils.py 85.04% ⭐ 85.00% ⭐ -0.04% 👎

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
working_with_data/utils.py make_scatterplot_matrix 15 🙂 210 ⛔ 7 🙂 50.23% 🙂 Try splitting into smaller methods
sonar_clf_rf.py get_split 9 🙂 103 🙂 13 😞 54.61% 🙂 Extract out complex expressions
helpers/gradient_descent.py minimize_stochastic 6 ⭐ 103 🙂 13 😞 57.46% 🙂 Extract out complex expressions
sonar_clf_rf.py split 9 🙂 123 😞 9 🙂 59.28% 🙂 Try splitting into smaller methods
k_nearest_neighbors/utils.py classify_and_plot_grid 4 ⭐ 137 😞 9 🙂 62.43% 🙂 Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant